Previous post with discussion, 563 days ago: <a href="https://news.ycombinator.com/item?id=7225739" rel="nofollow">https://news.ycombinator.com/item?id=7225739</a><p>PDF via same nature.com: <a href="https://news.ycombinator.com/item?id=8404620" rel="nofollow">https://news.ycombinator.com/item?id=8404620</a><p>Related, dupes of each other:<p><a href="https://news.ycombinator.com/item?id=9463806" rel="nofollow">https://news.ycombinator.com/item?id=9463806</a><p><a href="https://news.ycombinator.com/item?id=9486059" rel="nofollow">https://news.ycombinator.com/item?id=9486059</a><p>Related:<p><a href="https://news.ycombinator.com/item?id=9119228" rel="nofollow">https://news.ycombinator.com/item?id=9119228</a>
From my experience, scientists, -at least in biology, where like in sociology you might have a lot of noise to deal with-, have an internal intuition that a single paper with a significant result does not mean that we have found the truth. The recent study which reported a reproducibility in sociology of about 36% strikes me as pretty accurate.<p>I think the scientific system can work with that. It means that if you build follow-up experiments based on a single paper there is a good chance that the experiment fails. In some way, the scientific system of publishing is self-correcting in this regard, because you can then cast doubt on the previous paper, which is easier to publish than if you only have a fresh negative result (p-value > threshold).
It is not that P-values are now bad by definition. It's only that they are many times wrongly intepreted. Putting too much confidence in P-values only might result in some wrong conclusions. And this is what some meta analyses discover. Many scientists try hard only to reach the "golden" <0.05 in order to claim discovery and publish it. This is why there is so many papers that misteriously cluster around 0.05...
Scientists have to do their work in a system that incentivizes bad science. How many people actually get to do their work in an environment that isn't hostile to them?
Isn’t a main problem with p-values that you don’t know whether significance (low p-value) is a result of big effect and small sample or big sample and small effect. This is why you also need a measure for the effect, for example the distance of the two measurements in terms of standard derivations.
I'm probably commenting too late to get my question answered, but here goes: the article has a pretty picture where they show how likely your p-values will mislead you depending on how likely the null hypothesis is. For instance, they say if you think that the null hypothesis has a 50% probability of being right, and you get p=5%, then there's still a 29% chance the null hypothesis is true. But according to my calculations, the right number should be 1/21 = 4.8%. What am I missing here? Or are they wrong? My calculations are below:<p>Curious George has 200 fascinating phenomena he wishes to investigate. In reality, 100 of those are real, and the other hundred are mere coincidences. The experiments for the 100 real phenomena all show that "yes, this is for real". (I'm assuming no false negatives.) Most of the 100 experiments that test bogus phenomena show that "this is bogus", but 5 of them achieve a significance of p=5%, as expected. George then runs of to tell the Man in the Yellow Hat about his 105 amazing discoveries. If Yellow Hat Man knows that half of the phenomena that capture George's attention are bogus, he knows that 5/105 = 1/21 = 4.8% of George's discoveries are likely bogus, even though he doesn't know which ones.
Great article. I'm not sure that replication itself will solve the problem since Type 1 error rate requires asymptotics. We'd have to run many replications and then show convergence. That'll be broadly cost-prohibitive for all but the most important conclusions. Lower thresholds probably won't do it either. Right now, the only solutions I see are:<p>a) Baysian methods<p>b) Fisher's single H hypothesis method<p>c) Tukey's Exploratory Data Analysis method.<p>d) All of the above.