a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment

This would fall under the umbrella of multiple hypothesis testing. In your example, the person is running the experiment 5 times and they only see the p-value below 0.05 for two of them... purely statistically, the chance of that happening by chance if the null hypothesis is correct is equal to the chance of two tests or more being significant, which is:

nCr(5, 2) * 0.05^2 * 0.95^3 + nCr(5, 3) * 0.05^3 * 0.95^2 + nCr(5, 4) * 0.05^4 * 0.95^1 + nCr(5, 5) * 0.05^5 * 0.95^0 ~= 0.02, which would be significant (p < 0.05).

However, if they did not define the number of experiments to run in advance, then the results would be meaningless and yeah, it would be p-value hacking. Unfortunately, this crops up in all areas of research because there's a million reasons that an experiment could have failed and it's easier to bury the ones where things failed and submit the results where p < 0.05 (See also: Publication bias ).

This is also a super big issue in medical research, where clinical trials are frequently not registered beforehand or results are not submitted after being registered. So you end up with a bias towards studies showing a drug works, resulting in patients being exposed to undue risk (See also: Bad Pharma ).

Edit: Fixed bad math