I would be VERY impressed if __any__ statistical model[0] could __ever__ reliably predict replicability. I just do not believe that there is enough information in papers to do this. I mean even in papers like machine learning works where algorithms are given and code and checkpoints handed out; these works even frequently have replicability issues! I just do not believe it is possible to put such information into writing.<p>Plus, it is missing two key points about replicability. First, just because you can type a recipe doesn't mean that the recipe is correct. I can tell you how to bake a cake but if I forget to tell you to add yeast (or the right amount) then you won't get a cake. This would be indistinguishable from a paper that looks replicable from one that isn't. You literally have to be an expert paying close attention to determine this.<p>Second off, a big part of replication is that there's variance. You CANNOT ever replicate in exactly the same conditions, and that's what we want! It helps us find unknown and hidden variables. Something confounds or couples with another and it can be hard to tell, but the difference in setting helps unearth that. You may notice this in some of the LK-99 works, how they aren't using the same exact formula as the original authors but "this should have the same result."<p>Instead, I think we're all focusing on the wrong problem here. We have to take a step back and look at things a bit more abstracted. As the article even suggests, works never get replicated because it is time consuming. So why is the time not worthwhile? Because replicating a result doesn't get you anything or advance your career. Despite replication being a literal cornerstone of science! Instead our model is to publish in top venues, publish frequently, and that publishing in venues requires "novelty" (whatever that means).<p>We live in Goodhart's Hell, and I think academia makes it much clearer. A bunch of bean counters need numbers on a spreadsheet to analyze things. They find correlations, assert that correlations are causations, make those the goal post, and then people hack those metrics. It is absolutely insane and it leaves a hell of a lot of economic inefficiency on the table. I'm sure if you think about this you can find this effect is overwhelmingly abundant in your life and especially in work (it is why it looks like your boss red the CIA's Subtle Sabotage Field Manual and thought they were great ideas).<p>Here's how you fix it though: procedurally. You ask yourself what your goals are. You then ask yourself if those those goals are actually what your intended goals are (subtle, but very different). Then you ask yourself how those can be measured as well as __if__ they can be measured. If they can't be measured exactly then you continue forward, but with extra caution and need to constantly ask yourself if these metrics are being hacked (hint: they will, but not always immediately). In fact, the more susceptible your metric is to hacking, the more work you have in store for yourself. You're going to need to make a new metric that tries to measure the hacking and tries to measure the alignment with the actual desired outcome. In ML we call all this "reward hacking." In business we call this bureaucracy.<p>The irony of this all is that more often than not, implementing a metric in the first place puts you in a worse position than if you had just let things run wild. The notion sounds crazy, but it does work out. The big driver is that there's noise in the system. People think you can decouple this noise, but it actually is often inherent. Instead you need to embrace the noise and make it part of your model. But if we're talking something like academia, there's too much noise to even reliably define good metrics. Your teaching output can't be quantified without tracking students over at least a decade, wherein they've been influenced by many other things by that point (first job isn't nearly enough). For research, it can take decades for a work to become worthy of a Nobel! Even after published. Most good work takes a lot of time, and if you're working on a quarterly milestone you're just making "non-novel" work, that needs to filter through a vague and noisy novel filter. You wonder why the humanities writes about a lot of BS? Well take this effect over 50 years and it'll make perfect sense. The more well studied an area is, the more crazy it is now.<p>So here's how you fix it in practice: let the departments sort it out. They are small enough that accountability is possible. They know who is doing good work and working hard, regardless of tangible output. If your department is too big, make sub groups (this can be about 100 people, so they can be pretty big). Get rid of this joke thing called journals. They were intended to make distribution easier and have never been able to verify a work by reading it (a reviewer is only capable to determining if a work is invalid or indeterminate, but not valid). Use your media arm to celebrate works which include replication! Reward replication! It's all a fucking PR game anyways. If your PR people can't spin that into people getting excited, hire new PR people. If Apple can sell us on a ground breaking and revolutionary $700 caster wheels and a $1k monitor stand, you can absolutely fucking sell us on one of the cornerstones of science. Hell, we've been seeing that be done with LK-99.<p>/rant<p>[0] without actually simulating the algorithms and data in the study, which we might just call replication...