comment by MisterMentat

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

MisterMentat · 3604 days ago · link · · parent · post: Scientists of hubski, what science do you science?

I just finished my master's in biostatistics and am about to start my PhD in statistics. You may not consider me a scientist, but I'm the guy that makes sure your science is grounded in actually testable hypotheses and that you're drawing the right conclusions based on the observable data that you've decided to collect.

On a pure statistics level, I'm really interested in Bayesian inference, stochastic computer simulations and computational statistics, machine learning (isn't everyone now, this is becoming a bit cliché), and improving the statistical literacy of the general scientific and lay community.

markup tips · 0

pikajew · 3604 days ago · link ·

I'm currently an undergrad studying computer science: bioinformatics and bio chemistry. In looking for PhD programs somewhat related to bioinformatics, I've seen quite a few biostatistics ones come up. My focus might be genome science with the computer science background being a plus (not sure yet, still figuring things out, lol), but as a biostatistician do you do anything with genome science as whole? I know, pretty vague overall, but I'm still trying to find where bioinformatics fits into the scientific community as it's a very interdisciplinary line of study.

+discuss+discuss

–

MisterMentat · 3603 days ago · link ·

I'm not quite sure what bioinformatics is exactly, but I did do a bit of work on genetics data during an internship at the CDC. From my point of view, biostatistics is not bioinformatics. Biostatisticians and statistics as a whole are a large field, but there are definitely some people that intersect with the informatics realm. It's definitely not an area everyone works in though. If you're interested in genetics, and looking at biostatistics programs. I highly recommend Columbia's biostatistics PhD program. I interviewed there, and they had a very large focus on developing methodology for analyzing genetic data. I ended up choosing another program because I wasn't particularly interested in that area and I wanted to do a pure statistics PhD. However, it seems like you'd like it quite a bit.

+discuss+discuss

caeli · 3604 days ago · link ·

improving the statistical literacy of the general scientific and lay community.

It never fails to astound me how many scientists are just completely statistically illiterate. It's so cool that you're interested in this! What do you think is the best way to improve statistical literacy?

+discuss+discuss

–

MisterMentat · 3604 days ago · link ·

It never fails to astound me how many scientists are just completely statistically illiterate.

I know. Sadly, introductory statistics courses are taught without really telling anyone why it's cool or how useful it is. I myself hated the statistics courses I took in high school and in my undergraduate studies. In my opinion I think people should be much more exposed to how statistics can be applied. I don't really think the best approach is teaching people how to look up t and z statistics on a table. You leave with the ability to say, "do a t-test because there are two groups and that's what we did that one time in class." Then they see a p-value that is "significant." I fucking hate that word.

It's so cool that you're interested in this! What do you think is the best way to improve statistical literacy?

The best way, IMO, is to think of things probabilistically. Understanding probability and probabilistic statements is key to understanding statistics. From a young age we are taught the laws of cause and effect. Especially in science classes, we are taught that if we do A, then B happens always. Otherwise it's not a causal relationship. However, in practice, we deal with much more complicated networks of causal relationships. We use randomness as an abstraction to model these complex relationship because it would be impossible to measure every factor in a causal relationship without infinite time, money, and infinitely precise instruments. This is why we see different magnitudes of effects. We don't, and can't, possibly measure everything that would affect the outcome. We use statistics to (hopefully) determine the most likely and most influential causal factors.

The statements we make are probabilistic though. Each conclusion we make has a chance of being wrong, no matter how careful we are. In fact, we expect about 1 in 20 of the studies performed (with the 0.05 significance level) to make incorrect conclusions. This is why replication is important. If multiple studies come to the same conclusion, we can be reasonably certain that we made the correct decision. A statistical statement in isolation is not always as concrete as it seems.

So I'd recommend above all understanding the probabilistic statements made during hypothesis testing, and the implications that the cutoffs you select have. I'd also recommend being familiar with all of the assumptions that the models you use make. Know why they make them, and know when you've violated the critical ones.

And please, please, please consult with a statistician if you're doing research. Most universities have consulting arms of their departments that are available for collaborative research inside and outside of their university. We would love to work with you! We won't bite. We study this stuff for our whole lives because it's hard. We didn't learn everything about physical chemistry and quantum mechanics in that one class we took in undergrad, and you didn't learn everything about statistics. Let's work together, and hopefully we can avoid some of the pitfalls of our predecessors.