The issue talked about here is distinct from the larger "reproducibility crisis"; the latter is a result of shoddily designed (or simply fraudulent) <i>experimental</i> work, whereas the issue here is the aggregate effects of the huge amount of <i>computational</i> work that is being done- even when that work is being done correctly and honestly.<p>Testing a hypothesis against a pre-existing dataset is a valid thing to do, and it is also almost trivially simple (and completely free) for someone with a reasonable computational background. There are researchers who spend a decent portion of their careers performing these analyses. This is all well and good- we want people to spend time analyzing the highly complex data that modern science produces- but we run into problems with statistics.<p>Suppose an analyst can test a hundred hypotheses per month (this is probably a low estimate.) Each analysis (simplifying slightly!) ends with a significance test, returning a p-value indicating the likelihood that the hypothesis is false. If p < 0.01, the researcher writes up the analysis and sends it off to a journal for publication, since the odds that this result was spurious are <i>literally</i> hundred-to-one. But you see the problem; even if we assume that this researcher tests <i>no valid hypotheses at all</i> over the course of a year, we would expect them to send out one paper per month- and each of these papers would be entirely valid, with no methodological flaws for reviewers to complain about.<p>In reality, of course, researchers sometimes test true hypotheses, and the rate of true to false computational-analysis papers would depend on the ratio of "true hypotheses that analysis successfully catches" to "false hypothesis that squeak by under the p-value threshold" (i.e., the True Positive rate vs the False Positive rate.) It's hard to guess that this ratio would be, but if AAAS is calling things a "crisis," it's clearly lower than we would like.<p>But there's a further problem, since the obvious solution- lower the p-value threshold for publication- would lower <i>both</i> the False Positive rate and the True Positive rate. The p-value that gets assigned to the results of an analysis of a <i>true</i> hypothesis are limited by the statistical power (essentially, size and quality) of the dataset being looked at; lower the p-value threshold too much, and analysts simply won't be able to make a sufficiently convincing case for any given true hypothesis. It's not a given that there is a p-value threshold for which the True Positive/False Positive ratio is much better than it is now.<p>"More data!" is the other commonly proposed solution, since we can safely lower the p-value threshold if we have the data to back up true hypotheses. But even if we can up the experimental throughput so much that we can produce True Positives at p < 0.0001, that simply means that computational researchers can explore more complicated hypotheses, until they're testing thousands or millions of hypotheses per month- and then we have the same problem. In a race between "bench work" and "human creativity plus computer science," I know which I'd bet on.