TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

ML is useful for many things, but not for predicting scientific replicability

116 pointsby randomwalkeralmost 2 years ago

7 comments

morkalorkalmost 2 years ago
From the OG paper:<p>&gt;Our machine learning model used an ensemble of random forest and logistic regression models to predict a paper’s likelihood of replication based on the paper’s text.<p>&gt;We trained a model using word2vec on a corpus of 2 million social science publication abstracts published between 2000 and 2017<p>&gt;converting publications into vectors. To do this, we multiplied the normalized frequency of each word in each paper in the training sample by its corresponding 200-dimension word vector, which produced a paper-level vector representing the textual content of the paper<p>If you took a paper and rearranged the words to have a completely different meaning, their method would produce the same prediction. It also has no understanding of, or the ability to differentiate between, quotation and references within the paper and content written by the authors themselves. Good luck with that! It&#x27;s basically just learning some known shitty combinations of keywords.
评论 #37107328 未加载
评论 #37106662 未加载
评论 #37106249 未加载
评论 #37106404 未加载
评论 #37106196 未加载
评论 #37106584 未加载
评论 #37108127 未加载
评论 #37107081 未加载
评论 #37106260 未加载
评论 #37106257 未加载
评论 #37108197 未加载
leedrake5almost 2 years ago
&gt; They found that the model relied on the style of language used in papers to judge if a study was replicable.<p>I think the failure here is to train the model on published results. I’ve worked with scientists who could write and infer a marvelous amount of information from shit data. And I’ve worked with scientists who poorly described ingenious methods with quality data. The current academic system incentivizes sensationalization of incremental advances which confirm previously published work. I’m not in the least surprised at the manuscript level that replication would fail.<p>The proper way to do this would be to log experimental parameters in a systematic reporting method specific to each field. With standardized presentation of parameters I suspect replicability would improve. But this would present an near impossible degree of coordination between different research groups. But it would be feasible for the NIH or NSF to demand such standardized logging as a condition of grant awards of a certain size.
personjerryalmost 2 years ago
I feel like this is another way of saying &quot;past data can&#x27;t predict the future&quot;
评论 #37108223 未加载
评论 #37106706 未加载
1letterunixnamealmost 2 years ago
2 800 lbs. gorilla truths:<p>0. AGI is a remote, distant possibility borne of science fiction, more remote than commercial fusion power or ubiquitous flying cars.<p>1. Narrow AI (software) will eat all areas that are reducible, but it cannot completely replace the interactive reasoning of a subject matter expert. That&#x27;s why there will never be AI lawyers, mechanical engineers, or civil engineers delivering statute or code interpretation reports.<p>------<p>IIRC (or someone was pulling my leg ¯\_(ツ)_&#x2F;¯ ), BMIR at Stanford was doing NLP ML of medical &amp;| biomedical informatics papers and trying to draw new conclusions from automated meta-analyses of existing papers.
评论 #37109501 未加载
sitkackalmost 2 years ago
How do you prove a negative?<p>My own heuristic works pretty good, if the artifacts for the paper are available and they’re either gonna get repo or Dr. image I’m gonna say that the paper is pretty reproducible. Or, if the paper instead of being exactly 10 pages is 20 or more it has an extensive appendix on the methods used. It also has a high likelihood of being reproducible, or if it includes links to data sets.
blitzaralmost 2 years ago
&gt; Earlier this year, a paper in the widely read PNAS journal raised the possibility of detecting non-replicable findings using machine learning (ML).<p>I wonder if they faked their paper too.
评论 #37114072 未加载
godelskialmost 2 years ago
I would be VERY impressed if __any__ statistical model[0] could __ever__ reliably predict replicability. I just do not believe that there is enough information in papers to do this. I mean even in papers like machine learning works where algorithms are given and code and checkpoints handed out; these works even frequently have replicability issues! I just do not believe it is possible to put such information into writing.<p>Plus, it is missing two key points about replicability. First, just because you can type a recipe doesn&#x27;t mean that the recipe is correct. I can tell you how to bake a cake but if I forget to tell you to add yeast (or the right amount) then you won&#x27;t get a cake. This would be indistinguishable from a paper that looks replicable from one that isn&#x27;t. You literally have to be an expert paying close attention to determine this.<p>Second off, a big part of replication is that there&#x27;s variance. You CANNOT ever replicate in exactly the same conditions, and that&#x27;s what we want! It helps us find unknown and hidden variables. Something confounds or couples with another and it can be hard to tell, but the difference in setting helps unearth that. You may notice this in some of the LK-99 works, how they aren&#x27;t using the same exact formula as the original authors but &quot;this should have the same result.&quot;<p>Instead, I think we&#x27;re all focusing on the wrong problem here. We have to take a step back and look at things a bit more abstracted. As the article even suggests, works never get replicated because it is time consuming. So why is the time not worthwhile? Because replicating a result doesn&#x27;t get you anything or advance your career. Despite replication being a literal cornerstone of science! Instead our model is to publish in top venues, publish frequently, and that publishing in venues requires &quot;novelty&quot; (whatever that means).<p>We live in Goodhart&#x27;s Hell, and I think academia makes it much clearer. A bunch of bean counters need numbers on a spreadsheet to analyze things. They find correlations, assert that correlations are causations, make those the goal post, and then people hack those metrics. It is absolutely insane and it leaves a hell of a lot of economic inefficiency on the table. I&#x27;m sure if you think about this you can find this effect is overwhelmingly abundant in your life and especially in work (it is why it looks like your boss red the CIA&#x27;s Subtle Sabotage Field Manual and thought they were great ideas).<p>Here&#x27;s how you fix it though: procedurally. You ask yourself what your goals are. You then ask yourself if those those goals are actually what your intended goals are (subtle, but very different). Then you ask yourself how those can be measured as well as __if__ they can be measured. If they can&#x27;t be measured exactly then you continue forward, but with extra caution and need to constantly ask yourself if these metrics are being hacked (hint: they will, but not always immediately). In fact, the more susceptible your metric is to hacking, the more work you have in store for yourself. You&#x27;re going to need to make a new metric that tries to measure the hacking and tries to measure the alignment with the actual desired outcome. In ML we call all this &quot;reward hacking.&quot; In business we call this bureaucracy.<p>The irony of this all is that more often than not, implementing a metric in the first place puts you in a worse position than if you had just let things run wild. The notion sounds crazy, but it does work out. The big driver is that there&#x27;s noise in the system. People think you can decouple this noise, but it actually is often inherent. Instead you need to embrace the noise and make it part of your model. But if we&#x27;re talking something like academia, there&#x27;s too much noise to even reliably define good metrics. Your teaching output can&#x27;t be quantified without tracking students over at least a decade, wherein they&#x27;ve been influenced by many other things by that point (first job isn&#x27;t nearly enough). For research, it can take decades for a work to become worthy of a Nobel! Even after published. Most good work takes a lot of time, and if you&#x27;re working on a quarterly milestone you&#x27;re just making &quot;non-novel&quot; work, that needs to filter through a vague and noisy novel filter. You wonder why the humanities writes about a lot of BS? Well take this effect over 50 years and it&#x27;ll make perfect sense. The more well studied an area is, the more crazy it is now.<p>So here&#x27;s how you fix it in practice: let the departments sort it out. They are small enough that accountability is possible. They know who is doing good work and working hard, regardless of tangible output. If your department is too big, make sub groups (this can be about 100 people, so they can be pretty big). Get rid of this joke thing called journals. They were intended to make distribution easier and have never been able to verify a work by reading it (a reviewer is only capable to determining if a work is invalid or indeterminate, but not valid). Use your media arm to celebrate works which include replication! Reward replication! It&#x27;s all a fucking PR game anyways. If your PR people can&#x27;t spin that into people getting excited, hire new PR people. If Apple can sell us on a ground breaking and revolutionary $700 caster wheels and a $1k monitor stand, you can absolutely fucking sell us on one of the cornerstones of science. Hell, we&#x27;ve been seeing that be done with LK-99.<p>&#x2F;rant<p>[0] without actually simulating the algorithms and data in the study, which we might just call replication...