a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by MisterMentat
MisterMentat  ·  3483 days ago  ·  link  ·    ·  parent  ·  post: Scientists of hubski, what science do you science?

    It never fails to astound me how many scientists are just completely statistically illiterate.

I know. Sadly, introductory statistics courses are taught without really telling anyone why it's cool or how useful it is. I myself hated the statistics courses I took in high school and in my undergraduate studies. In my opinion I think people should be much more exposed to how statistics can be applied. I don't really think the best approach is teaching people how to look up t and z statistics on a table. You leave with the ability to say, "do a t-test because there are two groups and that's what we did that one time in class." Then they see a p-value that is "significant." I fucking hate that word.

    It's so cool that you're interested in this! What do you think is the best way to improve statistical literacy?

The best way, IMO, is to think of things probabilistically. Understanding probability and probabilistic statements is key to understanding statistics. From a young age we are taught the laws of cause and effect. Especially in science classes, we are taught that if we do A, then B happens always. Otherwise it's not a causal relationship. However, in practice, we deal with much more complicated networks of causal relationships. We use randomness as an abstraction to model these complex relationship because it would be impossible to measure every factor in a causal relationship without infinite time, money, and infinitely precise instruments. This is why we see different magnitudes of effects. We don't, and can't, possibly measure everything that would affect the outcome. We use statistics to (hopefully) determine the most likely and most influential causal factors.

The statements we make are probabilistic though. Each conclusion we make has a chance of being wrong, no matter how careful we are. In fact, we expect about 1 in 20 of the studies performed (with the 0.05 significance level) to make incorrect conclusions. This is why replication is important. If multiple studies come to the same conclusion, we can be reasonably certain that we made the correct decision. A statistical statement in isolation is not always as concrete as it seems.

So I'd recommend above all understanding the probabilistic statements made during hypothesis testing, and the implications that the cutoffs you select have. I'd also recommend being familiar with all of the assumptions that the models you use make. Know why they make them, and know when you've violated the critical ones.

And please, please, please consult with a statistician if you're doing research. Most universities have consulting arms of their departments that are available for collaborative research inside and outside of their university. We would love to work with you! We won't bite. We study this stuff for our whole lives because it's hard. We didn't learn everything about physical chemistry and quantum mechanics in that one class we took in undergrad, and you didn't learn everything about statistics. Let's work together, and hopefully we can avoid some of the pitfalls of our predecessors.