__Login__or

__Take a Tour__!

- Unfortunately, of course, most effect sizes are not factor of 20. Indeed, they are usually less than a factor of 2. As we saw in the mask study, the effect size was less than a factor of 1.1. I’m picking on the mask study only because it has been so attention grabbing. It’s a convenient example to illustrate how statistical modeling can muddy the waters in randomized control trials. But it is only one of many examples I’ve come across in the past few months. If you pick a random paper out of the New England Journal of Medicine or the American Economic Review, you will likely find similar statistical muddiness.

Smart denizens of Hubski, what are your thoughts?

I'm not normally a stay-in-your-lane kind of guy. I went to grad school for physics and now I run the pharmacology department at a small drug company. Never cared much for staying in my lane. That said, this dude should stay in his lane.

There's a big difference between P-value manipulation and a small effect size. He is confusing the two things pretty starkly. When you design a study, you take everything you know about the data *a priori*, which is usually something about the delta between, say a pristine sample and your perturbed cohort, and also something about the variance of either or both cohorts. Then you make an assumption about what effect size ** you** would count as significant, and you calculate how many subjects you need to study to observe the effect with, say, an 80% chance of success

*if the effect you assumed is real*. That is how good stats is done. And the assumed effect size was chosen precisely because you chose it to be meaningful.

Finding a true mean difference that is so small as to be meaningless often requires more subjects that is feasible to study. So I think in the case when stats are done correctly (which is to say prospectively not retrospectively), effect size and statistical significance should be simpatico.

That said, there is a who other topic of relative vs. absolute risk, and of course there are policy tradeoffs that can't be settled by stats. All one can say is here's what I set out to measure and here's what I actually measured. Then it's up to society to figure out what to do about it. If that's, e.g., a new cancer screening tool and it reduces deaths from some specific cancer by 70% because it catches it early, you'd say great, why not mandate it. But then you find out that only 1/10,000 people develop that type of cancer, so that "70%" actually means less than 0.0001% of the population. Well now you have to think about (a) how much does the test cost; (b) how invasive is it; and (c) what is the false positive rate?

I guess my point is that there's no simple way to judge what the tradeoff between effect size and statistical significance is, because we live in a world where nothing exists in a vacuum. Each new study of each new intervention needs to be evaluated on its own terms in its own reality. Making blanket statements that A matters more than B is plainly wrong.