AAAS: Machine learning 'causing science crisis'

136 点作者 adzicg超过 6 年前

24 条评论

kevcampb超过 6 年前

Is machine learning really to blame for the reproducibility crisis? I'm not in academia, but it seemed to me that the problem was entirely present without machine learning being involed.For example, Amgen reporting that of landmark cancer papers they reviewed, 47 of the 53 could not be replicated [1]. I would have assumed that most of them didn't involve 'machine learning'[1] <a href="https://www.reuters.com/article/us-science-cancer/in-cancer-science-many-discoveries-dont-hold-up-idUSBRE82R12P20120328" rel="nofollow">https://www.reuters.com/article/us-science-cancer/in-cancer-...</a>

评论 #19184356 未加载

评论 #19184736 未加载

评论 #19185605 未加载

评论 #19184800 未加载

评论 #19185817 未加载

harry8超过 6 年前

Fails to touch on the perverse incentives in academia, "publish or perish" etc. Torturing a dataset to find a p value that a journal will like (or equivalent stat measure) is better for your career than not publishing a paper that will be discredited in time. You have no incentive at all to decide "my results are unconvincing at this point, I'm not going to submit them" and every reason to write them up as a useful contribution to human understanding even if you kind of know, deep down, it really isn't. Especially if you're not senior...

评论 #19184066 未加载

评论 #19184155 未加载

评论 #19183991 未加载

评论 #19184076 未加载

评论 #19184890 未加载

评论 #19184016 未加载

评论 #19186361 未加载

hobofan超过 6 年前

ML is not causing a reproducibility crisis, it just exposes one that is already there.> If we had an additional dataset would we see the same scientific discovery or principle on the same dataset?The same holds true for traditional science based on traditional statistics. It just seems that traditional datasets are under less scrutiny of reproducibility and are taken more easily at face value.

评论 #19184072 未加载

willj超过 6 年前

Curious (possibly naive) question: isn't there a fundamental difference between the goals behind creating models with ML vs the "old-fashioned" way? That is, in modern ML applications, you're creating a model with dozens/hundreds of potential variables, without a hypothesis of how they relate or contribute to the target (other than that they might, hence your including them in the modeling process). You're using the model for predictions more than for explainability (though there is work ongoing into improving explainability, but it seems kind of post hoc to me). And there's an expectation that you will retrain, or at least tune, the model as its predictive accuracy decays over time.By contrast, traditionally in science you're coming in with a hypothesis ahead of time about what variables predict what target. The goal is to come up with a model that is consistent with your hypothesis (and possibly some existing theory), and which can be applied generally, and which should need no tuning. For example, the very simple model for Beer's Law-- absorbance vs concentration. That is a law that will apply in every other circumstance, but if modern ML methods had been applied, the scientist might have chosen the model with a slightly better score but which includes nonsense variables in addition to concentration.All that to say, it seems to me the problem stems from scientists' lack of hypotheses at the outset of a project, and/or the understandable desire to get the best bang for their buck out of an experiment by measuring dozens of variables at once and hoping the magic of ML can find a hypothesis for them.Hope that made sense.

评论 #19184132 未加载

评论 #19184544 未加载

评论 #19184382 未加载

评论 #19184428 未加载

paraschopra超过 6 年前

Science works because it posits models first, and then data is sought to confirm or disconfirm it. The benefit of having a model first is that it is much more likely to be general (and hence reproducible).ML does completely opposite. Data first, and then the model is discovered using data. It's pretty easy to see why it would lead to non-reproducible models.

评论 #19186448 未加载

fock超过 6 年前

An undergrad to his supervisor in our office talking about publishing a paper: I've fixed the data, now the plots look ok. I (undergrad too) am sitting there thinking - well, you are using ML as a regression blackbox to plot a line, I can do that too w/o ML if I'm fixing the data. Supervisor: ok, that's really great. Me cringing...I'm not hammering the ML-keyword above my work (and thus am getting considerably less academic attention), but it's nice to hear from people who made it in academia that they support my theory. 50% off the people are just showoffs throwing buzzwords and positivity around while they produce a load of sh...

评论 #19184128 未加载

评论 #19185602 未加载

评论 #19184189 未加载

评论 #19183986 未加载

评论 #19184213 未加载

caramelsuit超过 6 年前

That was a terrible article. I didn't see even one concrete example of their complaint. Blaming the reproducibility crisis on machine learning methods is just a cheap dodge.

bitL超过 6 年前

My impression from the article was that the doctor stating those opinions has no idea how ML works and how to apply it properly, leading to statements like that. "ML gap" is real I guess...

评论 #19185272 未加载

评论 #19185059 未加载

boomskats超过 6 年前

Good read. It's also refreshing to see a mainstream article that talks about ML without once mentioning 'AI'.

评论 #19183982 未加载

evrydayhustling超过 6 年前

How does this article manage not to mention a single actual example of ML-related misconceptions?? I'm sure they exist, but there is literally nothing here except some assertions and a plug for a vaguely remedial research line.

评论 #19196856 未加载

dguest超过 6 年前

This is a misleading title. The researcher they quote is> ... developing the next generation of machine learning and statistical techniques that can ... also report how uncertain their results are and their likely reproducibility.So she's actually using machine learning to access systematic uncertainties, i.e. to get better, more reproducible research. Of course, like all forms of automation, people tend to sensationalize progress as a crisis because it makes it too easy to shoot yourself in the foot.But doing things "the old fashioned way" isn't any better. Early particle physics experiments would get armies of undergrads classify photographs of collisions in bubble chambers. These results took thousands of researcher-hours to compile, which might seem all fine and dandy, until you realize that there may have been a systematic bias in your classification. Now what do you do?Thanks to machine learning, there are a lot of things we can do: we can try to remove the bias and retrain the algorithm, or we can train with extreme examples of bias and use that to quote a systematic uncertainty. We can try a multitude of approaches to estimate uncertainties rerun our entire analysis in a few hours. Good luck doing that with an army of undergrads.

sgt101超过 6 年前

Case in point : LHC Higgs results - how many detection's vs how many events? How were the detection's determined... The answer is with a large booster [1]I postulate that out of 12 billion random events it would be remarkable if a booster didn't extract 100 or so items that looked similar to a Higgs detection.Well, let's give it 20 years and a new generation of PI's who aren't invested in this and have grad students who are keen to find something different in the data.But ohh.. all the data has been thrown aways... oh! [2][1] <a href="https://indico.cern.ch/event/705941/contributions/2897000/attachments/1605280/2546655/mlhepAthens-Feb22-2018.pdf" rel="nofollow">https://indico.cern.ch/event/705941/contributions/2897000/at...</a>[2] <a href="https://www.forbes.com/sites/startswithabang/2018/09/13/has-the-large-hadron-collider-accidentally-thrown-away-the-evidence-for-new-physics/#d1c86469270a" rel="nofollow">https://www.forbes.com/sites/startswithabang/2018/09/13/has-...</a>

评论 #19185164 未加载

评论 #19185061 未加载

评论 #19184231 未加载

afabisch超过 6 年前

Overfitting is a well-known problem in the ML community. There are methods to avoid this: cross validation, train-test splits, etc. There are also models that give you an estimate of the standard deviation of a prediction. What is the point? We don't need new algorithms, we just have to apply existing methods properly.

itg超过 6 年前

Title makes it sound as if the AAAS made this statement, its a single researcher who is making this claim.

x3tm超过 6 年前

> Machine learning 'causing science crisis'ML or more generally mathematics do not cause anything. People who misuse mathematics are to blame here. Some fields are simply using tools they don't understand and this predates ML advances by decades. Thinking of stats use in psychology and medicine for instance.This trend of presenting ML are some kind of magic powder is ridiculous. I blame hyped presentations by influential ML scientists for this.

e_carra超过 6 年前

I wonder: don't machine learning frameworks' results come with a level of confidence?Ps: I have no experience with anything regarding ML.

评论 #19184737 未加载

评论 #19187041 未加载

评论 #19184130 未加载

评论 #19184098 未加载

anjc超过 6 年前

I can see there being issues with reproducibility, i.e. getting the exact same results, but has there ever been a time when science was more replicable? Data/techniques/findings/papers are under more scrutiny than ever. No positive results will be taken as sacrosanct in CS anymore. This is a complete 180 from 10+ years ago.

评论 #19186674 未加载

raverbashing超过 6 年前

Hopefully machine learning helps with confidence and making predictions out of experiments as opposed to the limited capability of "understanding" from the way things are done now (as if an experiment with slightly higher p values are ignored or with smaller values might have hidden biases, etc).

77pt77超过 6 年前

I've often imagined how different Newtonian physics would be if we had gone the ML route from the beginning.

bayesian_horse超过 6 年前

The other day someone lamented that you can't get published as an honest ML researcher, because other scientists are rendering whole professions obsolete all the time...

评论 #19184065 未加载

daodedickinson超过 6 年前

It's not like teaching to the test works better for humans.

repolfx超过 6 年前

As other comments observe, the replication crisis predates the use of ML, so the causes are clearly deeper.I think there's actually a very simple explanation for this which lots and lots of people hate, so they're sort of in denial about it. Academia is entirely government funded and has little or no accountability to the outside world. Academic incentives are a closed loop in which the same sorts of people who are producing papers are also reviewing them, publishing them, allocating funding, assessing each other's merits etc. It's a giant exercise in marking your own homework.Just looked at in purely economic terms, academia is a massive planned economy. The central planners (grant bodies) decide that what matters is volume and novelty of results, so that's what they get, even though the resulting stream of papers is useless to the people actually trying to apply science in the real world ... biotech firms here but the same problem crops up in many fields. It's exactly what we'd expect to see given historical precedent and the way the system works.There's another huge elephant in the room here beyond the replication crisis ("to what extent are the outputs wrong") which is the question of to what extent are the outputs even relevant to begin with? Whenever I sift through academic output I'm constantly amazed at the vast quantity of obviously useless research directions and papers that appear to be written for their cleverness rather than utility. The papers don't have to be wrong to be useless, they can just solve non-problems or make absurd tradeoffs that would never fly in any kind of applied science.I read a lot of CS papers and I've noticed over time that the best and most impactful papers are almost always the ones coming out of corporate research teams. I think this is because corporate funded research has some kind of ultimate accountability and connection to reality that comes from senior executives asking hard questions about applicability. For instance in the realm of PL research academia pumps out new programming languages all the time, but they rarely get any traction and the ideas they explore are frequently ignored by the industrial developers of mainstream languages because they're completely impractical. This problem is usually handwaved away by asserting that the ideas aren't bad ideas, they're just incredibly futuristic and 30 years from now we'll definitely be using them - but this kind of reasoning is unfalsifiable on any kind of sensible timescale so it's the same as saying, "I shouldn't be held accountable within the span of my own career for how I spend tax and student money".As time goes by I am getting more and more sympathetic to the idea of just drastically cutting academic funding and balancing the books by drastically reducing corporation tax. The amount of total research would fall significantly because corporations wouldn't invest all the newly available money in research, or even most of it, but it's unclear to me that this would be a bad thing - if 75% of research studies coming out of academic biotech are wrong then it stands to reason that if standards were improved significantly, funding could be reduced by (say) 50% and still get a similar quantity of accurate papers out the other end. It's possible the science crisis is really just reflecting massive oversupply of scientists, massive undersupply of accountability and in general research should be a much smaller social effort than it presently is.

评论 #19198280 未加载

评论 #19185538 未加载

stiff超过 6 年前

A dishonest scientist can mine a dataset for statistically significant hypotheses and for a long time no institutional protection against it was in place:<a href="https://en.wikipedia.org/wiki/Data_dredging" rel="nofollow">https://en.wikipedia.org/wiki/Data_dredging</a><a href="https://www.xkcd.com/882/" rel="nofollow">https://www.xkcd.com/882/</a>Machine learning makes it easier to test great many hypothesis, but even going fully "by hand" it is very easy to deviate from what the statistical framework of hypothesis testing would demand. There are now some discussions about counter-measures, e.g. about preregistration of studies:<a href="http://www.sciencemag.org/news/2018/09/more-and-more-scientists-are-preregistering-their-studies-should-you" rel="nofollow">http://www.sciencemag.org/news/2018/09/more-and-more-scienti...</a>You can see this as another chapter in the long debate about the correct way to test scientific hypotheses:<a href="https://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Criticism" rel="nofollow">https://en.wikipedia.org/wiki/Statistical_hypothesis_testing...</a>

评论 #19183971 未加载

maxander超过 6 年前

The issue talked about here is distinct from the larger "reproducibility crisis"; the latter is a result of shoddily designed (or simply fraudulent) experimental work, whereas the issue here is the aggregate effects of the huge amount of computational work that is being done- even when that work is being done correctly and honestly.Testing a hypothesis against a pre-existing dataset is a valid thing to do, and it is also almost trivially simple (and completely free) for someone with a reasonable computational background. There are researchers who spend a decent portion of their careers performing these analyses. This is all well and good- we want people to spend time analyzing the highly complex data that modern science produces- but we run into problems with statistics.Suppose an analyst can test a hundred hypotheses per month (this is probably a low estimate.) Each analysis (simplifying slightly!) ends with a significance test, returning a p-value indicating the likelihood that the hypothesis is false. If p < 0.01, the researcher writes up the analysis and sends it off to a journal for publication, since the odds that this result was spurious are literally hundred-to-one. But you see the problem; even if we assume that this researcher tests no valid hypotheses at all over the course of a year, we would expect them to send out one paper per month- and each of these papers would be entirely valid, with no methodological flaws for reviewers to complain about.In reality, of course, researchers sometimes test true hypotheses, and the rate of true to false computational-analysis papers would depend on the ratio of "true hypotheses that analysis successfully catches" to "false hypothesis that squeak by under the p-value threshold" (i.e., the True Positive rate vs the False Positive rate.) It's hard to guess that this ratio would be, but if AAAS is calling things a "crisis," it's clearly lower than we would like.But there's a further problem, since the obvious solution- lower the p-value threshold for publication- would lower both the False Positive rate and the True Positive rate. The p-value that gets assigned to the results of an analysis of a true hypothesis are limited by the statistical power (essentially, size and quality) of the dataset being looked at; lower the p-value threshold too much, and analysts simply won't be able to make a sufficiently convincing case for any given true hypothesis. It's not a given that there is a p-value threshold for which the True Positive/False Positive ratio is much better than it is now."More data!" is the other commonly proposed solution, since we can safely lower the p-value threshold if we have the data to back up true hypotheses. But even if we can up the experimental throughput so much that we can produce True Positives at p < 0.0001, that simply means that computational researchers can explore more complicated hypotheses, until they're testing thousands or millions of hypotheses per month- and then we have the same problem. In a race between "bench work" and "human creativity plus computer science," I know which I'd bet on.