TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Effect size is significantly more important than statistical significance

376 点作者 stochastician超过 3 年前

18 条评论

jerf超过 3 年前
Speaking not to this study in particular necessarily, I strongly agree with the general point. Science has really been held back by an over-focusing on &quot;significance&quot;. But I&#x27;m not really interested in a pile of hundreds of thousands of studies that establish a tiny effect with suspiciously-just-barely-significant results. I&#x27;m interested in studies that reveal robust results that are reliable enough to be built on to produce other results. Results of 3% variations with p=0.046 aren&#x27;t. They&#x27;re dead ends, because you can&#x27;t put very many of those into the foundations of future papers before the probability of one of your foundations being incorrect is too large.<p>To the extent that those are hard to come by... Yeah! They are! Science is hard. Nobody promised this would be easy. Science <i>shouldn&#x27;t</i> be something where labs are cranking out easy 3%&#x2F;p=0.046 papers all the time just to keep funding. It&#x27;s just a waste of money and time of our smartest people. It <i>should</i> be harder than it is now.<p>Too many proposals are obviously only going to be capable of turning up that result (insufficient statistical power is often obvious right in the proposal, if you take the time to work the math). I&#x27;d rather see more wood behind fewer arrows, and see fewer proposals chasing much more statistical power, than the chaff of garbage we get now.<p>If I were King of Science, or at least, editor of a prestigious journal, I&#x27;d want to put word out that I&#x27;m looking for papers with at least one of some sort of <i>significant</i> effect, or a p value of something like p = 0.0001. Yeah. That&#x27;s a high bar. I know. That&#x27;s the point.<p>&quot;But jerf, isn&#x27;t it still valuable to map out all the little things like that?&quot; No, it really isn&#x27;t. We already have every reason in the world to believe the world is <i>drenched</i> in 1%&#x2F;p=0.05 effects. &quot;Everything&#x27;s correlated to everything&quot;, so that&#x27;s not some sort of amazing find, it&#x27;s the totally expected output of living in our reality. Really, this sort of stuff is still just <i>below the noise floor</i>. Plus, the idea that we can remove such small, noisy confounding factors is just silly. We need to look for the things that stand out from that noise floor, not spending billions of dollars doing the equivalent of listening to our spirit guides communicate to us over white noise from the radio.
评论 #28530243 未加载
评论 #28528621 未加载
评论 #28529310 未加载
评论 #28533543 未加载
评论 #28528561 未加载
评论 #28528630 未加载
评论 #28528520 未加载
评论 #28528546 未加载
评论 #28535129 未加载
评论 #28528568 未加载
评论 #28535553 未加载
评论 #28532817 未加载
评论 #28532609 未加载
评论 #28538307 未加载
评论 #28528638 未加载
评论 #28540020 未加载
评论 #28530375 未加载
robocat超过 3 年前
From the article:<p>Ernest Rutherford is famously quoted proclaiming “If your experiment needs statistics, you ought to have done a better experiment.”<p>“Of course, there is an existential problem arguing for large effect sizes. If most effect sizes are small or zero, then most interventions are useless. And this forces us scientists to confront our cosmic impotence, which remains a humbling and frustrating experience.”
评论 #28534556 未加载
exporectomy超过 3 年前
I wonder if we should separate the roles of scientist and researcher. Universities would have generalist &quot;scientists&quot; who&#x27;s job would be to consult for domain-specialized researchers to ensure they&#x27;re doing the science and statistics correctly. That way, we don&#x27;t need every researcher in every field to have a deep understanding of statistics, which they often don&#x27;t.<p>Either that or stop rewarding such bad behavior. Science jobs are highly competitive, so why not exclude people with weak statistics? Maybe because weak statistics leads to more spurious exciting publications which makes the researcher and institution look better?
评论 #28529180 未加载
评论 #28533811 未加载
评论 #28533043 未加载
评论 #28528822 未加载
abeppu超过 3 年前
I think the weird thing is that a bunch of people in tech understand this well _with respect to tech_, but often fall into the same p-value trap when reading about science.<p>If you&#x27;re working with very large datasets generated from e.g. a huge number of interactions between users and your system, whether as a correlation after the fact, or as an A&#x2F;B experiment, getting a statistically significant result is easy. Getting a meaningful improvement is rarer, and gets harder after a system has received a fair amount of work.<p>But then people who work in these big-data contexts can read about a result outside their field (e.g. nutrition, psychology, whatever), where n=200 undergrads or something, and p=0.03 (yay!) and there&#x27;s some pretty modest effect, and be taken in by whatever claim is being made.
RandomLensman超过 3 年前
These discussions are fun but rather pointless: e.g., sometimes a small effect is really interesting but it needs to be pretty strongly supported (for instance, claiming a 1% higher electron mass or a 2% survival rate in rabies).<p>Also, most published research is inconsequential so it really does not matter other than money spent (and that is not only related to findings but also keeping people employed etc.). If confidence in results is truly an objective might need to link it directly to personal income or loss of income, ie force bets on it.
versteegen超过 3 年前
If you have a tiny effect size on X, you probably haven&#x27;t discovered a significant cause of X, but just something incidental.<p>For example, smoking was finally proved to cause lung cancer because the effect size was so large that the argument that &#x27;correlation does not imply causation&#x27; became absurd: it would have required the existence of a genetic or other common cause Z that both causes people to smoke and causes them to develop cancer with correlations at least as large as between smoking and lung cancer, but there just isn&#x27;t anything correlated that strongly. It would imply that almost everyone who smokes heavily does so because of Z.
hammock超过 3 年前
&gt;Effect Size Is Significantly More Important Than Statistical Significance<p>Ok, but by how much?
评论 #28532909 未加载
ummonk超过 3 年前
Agree with the title, but not the contents. The study in question is actually an example of a huge effect size (10% reduction in cases just from instructing villages they should wear masks is amazing) possibly hampered by poor statistical significance (as the blog post outlines).
评论 #28532717 未加载
评论 #28541884 未加载
评论 #28533227 未加载
georgewsinger超过 3 年前
HN comments are usually the time for spicy contrarian takes to OP, but <i>this post is dead on</i>.<p>Low effect sizes are often a code smell for scientific incrementalism&#x2F;stagnation.
agnosticmantis超过 3 年前
An investigator needs to rule out all conceivable ways their modeling can go wrong, among them the possibility of a statistical fluke, which statistical significance is supposed to take care of. So statistical significance may best be thought of as a necessary condition, but is typically is taken to be a sufficient condition for publication. If I see a strange result (p-value &lt; 0.05), could it be because my functional form is incorrect? or because I added&#x2F;removed some data? Or I failed to include an important variable? These are hard questions and not amenable to algorithmic application and mass production. Typically these questions are ignored, and only the possibility of a statistical fluke is ruled out (which itself depends on the other assumptions being valid).<p>Dave Freedman&#x27;s Statistical Models and Shoe Leather is a good read on why such formulaic application of statistical modeling is bound to fail.[0]<p>[0:<a href="https:&#x2F;&#x2F;psychology.okstate.edu&#x2F;faculty&#x2F;jgrice&#x2F;psyc5314&#x2F;Freedman_1991A.pdf" rel="nofollow">https:&#x2F;&#x2F;psychology.okstate.edu&#x2F;faculty&#x2F;jgrice&#x2F;psyc5314&#x2F;Freed...</a>]
fmajid超过 3 年前
The studies are in villages, but the real concern is dense urban environments like New York (or Dhaka) where people are tightly packed together and at risk of contagion. I&#x27;m pretty sure masks make little difference in Wyoming either, where the population is 5 people per square mile.
评论 #28535788 未加载
mrtranscendence超过 3 年前
&gt; If most effect sizes are small or zero, then most interventions are useless.<p>But this doesn&#x27;t necessarily follow, does it? If there really were a 1.1-fold reduction in risk due to mask-wearing it could still be beneficial to encourage it. The salient issue (taking up most of the piece) seems to be not the size of the effect but rather the statistical methodology the authors employed to measure that size. The p-value isn&#x27;t meaningful in the face of an incorrect model -- why isn&#x27;t the answer a better model rather than just giving up?<p>Small effects are everywhere. Sure, it&#x27;s harder to disentangle them, but they&#x27;re still often worth knowing.
评论 #28532537 未加载
评论 #28529535 未加载
评论 #28530644 未加载
kbrtalan超过 3 年前
There’s a whole book about this idea, Antifragile by Nassim Taleb, highly recommended
_Nat_超过 3 年前
The title&#x27;s misinformation: effect-size <i>ISN&#x27;T</i> more important than statistical significance.<p>The article itself makes some better points, e.g.<p>&gt; I worry that because of statistical ambiguity, there’s not much that can be deduced at all.<p>, which would seem like a reasonable interpretation of the study that the article discusses.<p>However, the title alone seems to assert a general claim about statistical interpretation that&#x27;d seem potentially harmful to the community. Specifically, it&#x27;d seem pretty bad for someone to see the title and internalize a notion of effect-size being more important than statistical significance.
评论 #28532240 未加载
sanxiyn超过 3 年前
Mask&#x27;s effect size on seroprevalence is probably zero. So no effect is expected result.<p>That&#x27;s because mask acts on R0, not seroprevalence. After acting on R0, if R0 is &gt;1, exponential growth, if &lt;1, exponential decay. So no effect, unless it is the thing that pushes one from &gt;1 to &lt;1.
评论 #28529708 未加载
评论 #28529512 未加载
Ice_cream_suit超过 3 年前
The better medical journals do stress the hazard ration efficacy and the confidence interval.<p>See the extract below from the NEJM: Seasonal Malaria Vaccination with or without Seasonal Malaria Chemoprevention<p>&quot;The hazard ratio for the protective efficacy of RTS,S&#x2F;AS01E as compared with chemoprevention was 0.92 (95% confidence interval [CI], 0.84 to 1.01), which excluded the prespecified noninferiority margin of 1.20.<p>The protective efficacy of the combination as compared with chemoprevention alone was 62.8% (95% CI, 58.4 to 66.8) against clinical malaria, 70.5% (95% CI, 41.9 to 85.0) against hospital admission with severe malaria according to the World Health Organization definition, and 72.9% (95% CI, 2.9 to 92.4) against death from malaria.<p>The protective efficacy of the combination as compared with the vaccine alone against these outcomes was 59.6% (95% CI, 54.7 to 64.0), 70.6% (95% CI, 42.3 to 85.0), and 75.3% (95% CI, 12.5 to 93.0), respectively.&quot;<p><a href="https:&#x2F;&#x2F;www.nejm.org&#x2F;doi&#x2F;full&#x2F;10.1056&#x2F;NEJMoa2026330?query=featured_home" rel="nofollow">https:&#x2F;&#x2F;www.nejm.org&#x2F;doi&#x2F;full&#x2F;10.1056&#x2F;NEJMoa2026330?query=fe...</a>
ammon超过 3 年前
But how much more important? :) Sorry, could not help myself.
nabla9超过 3 年前
If you have one BALB&#x2F;c lab mouse, you give it something, and it glows in the dark few months after, effect size alone makes it significant.
评论 #28534349 未加载