a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by Cumol

Just as Dendrophobe said, I only know NHST.

thundara suggested to increase the n-number. What about if someone sees a difference after repeating the test 3 times but did not get a significant result using any test? He then decided to repeat the experiment two more times and surprise surprise, the p value is under 0.05.

Is that considered p-value hacking?

Edit: I just checked the reddit frontpage and found this. A randomized double-blind placebo-controlled study on celiac sensitivity.

The p-value was p=0.034 and the money shot is this figure

Now the reddit hive-mind is going against the results. Funny that the community usually considered mainly skeptic also believes what they want to believe.





thundara  ·  3590 days ago  ·  link  ·  

    Now the reddit hive-mind is going against the results. Funny that the community usually considered mainly skeptic also believes what they want to believe.

It's annoying how anxious the reddit community can be to know answers. That study you linked had 61 participants which may be enough to drive further inquiry, but hardly enough to make conclusions about an entire population / sub-groups within a population. Science is generally patient about finding answers, but redditors are quick to jump on any preliminary results before they have a chance to be replicated by other groups and in other populations.

thundara  ·  3590 days ago  ·  link  ·  

This would fall under the umbrella of multiple hypothesis testing. In your example, the person is running the experiment 5 times and they only see the p-value below 0.05 for two of them... purely statistically, the chance of that happening by chance if the null hypothesis is correct is equal to the chance of two tests or more being significant, which is:

nCr(5, 2) * 0.05^2 * 0.95^3 + nCr(5, 3) * 0.05^3 * 0.95^2 + nCr(5, 4) * 0.05^4 * 0.95^1 + nCr(5, 5) * 0.05^5 * 0.95^0 ~= 0.02, which would be significant (p < 0.05).

However, if they did not define the number of experiments to run in advance, then the results would be meaningless and yeah, it would be p-value hacking. Unfortunately, this crops up in all areas of research because there's a million reasons that an experiment could have failed and it's easier to bury the ones where things failed and submit the results where p < 0.05 (See also: Publication bias ).

This is also a super big issue in medical research, where clinical trials are frequently not registered beforehand or results are not submitted after being registered. So you end up with a bias towards studies showing a drug works, resulting in patients being exposed to undue risk (See also: Bad Pharma ).

Edit: Fixed bad math