TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Need for Reproducibility in Academic Research

65 pointsby bmahmoodabout 13 years ago

5 comments

joe_the_userabout 13 years ago
This topic presents a serious dilemma.<p>One aspect of science that doesn't get much attention in this debate is the role of the scientist as an ethical and idealistic actor; to be a scientist is (or was) have a higher calling, to help humanity get closer to the truth. And this is crucial to science itself because scientists need to be able to <i>trust</i> other scientists. And neither Everyone-watches-everyone-style trust nor you-will-be-punished-harshly-if-caught trust works. You need I-do-it-because-I-believe-in-it trust to make science work.<p>Now, the more that graduate students are made disposable, the more that professors live in a ruthless, sink-or-swim environment and so-forth, the less a scientist is likely to remain an idealist interested first and foremost in discovering the truth and the less that crucial element of trust will remain.<p>The latest fad is "outsourcing science". If we want to make science less broken, it seems like we should be going in the opposite direction.
评论 #3813606 未加载
评论 #3813588 未加载
alttagabout 13 years ago
I'm not in a medical field, but the problem likely exists for our discipline as well.<p>The issue, I suspect, stems from the nature of publishing: top-tier journals only publish "interesting" research, which means reproducing research is less welcomed and if performed needs to be accompanied by a serious value-add.<p>There is no incentive to reproduce. It makes it more difficult to publish. It doesn't lead to tenure. Why bother?
评论 #3813822 未加载
评论 #3814014 未加载
jbogganabout 13 years ago
Coming from a computationally intensive discipline in academia it is astounding how difficult it can be for researchers to reproduce their own results. The tendency is to write enough code to generate an impressive diagram for a journal illustration or presentation slide and move on. It's not uncommon to not know what date or version of a constantly shifting public data set the original result was generated from, or even where the scripts are located 6 months down the road. I tied myself in knots trying to iron out data bugs and irregularities that forced me to dump a year of research and recreate the entire upstream data pipeline in my lab.<p>In another example a very promising cancer drug prediction algorithm (with fascinating in vitro results tested by an affiliated lab) was abandoned because of a key researcher's untimely death and the complete lack of version control anywhere in the lab. The paper had already been published (thankfully) but we literally had no idea where the code and the intermediary data were. We had a ~5,000 node GPFS cluster with rolling backups but it didn't help at all because all the development was done locally; the situation was the same across the lab. The decision of the PI in the wake of this compound tragedy was to have lab members pair up and "cross train" each other for an hour and verbally tell them where they kept their important data.<p>Referring to the corrupted data issue I personally experienced, I unfortunately discovered it the night before a multi-departmental research presentation. There were numerous reversed edges in a large digraph due to improper integration of two data sets before my involvement (I was also at fault for trusting internal data). I told the PI about it in the morning since the problem was so deep and said I couldn't present anything because every single result of the past year was invalidated by the bug I had found. His response: present anyway. I refused. That did not go over well.<p>I'd like to see every computational paper (especially in biology where these methods end up influencing human clinical medicine) include all source code in a public repository but it isn't going to happen. Labs would lose their edge if they had to tell competitors what model weights they had iterated to in creating their newest prediction algorithms and university technology transfer departments would have greater difficulties patenting these methods and selling them to drug companies. The current model will not change but a new one might supplant it.<p>I wasn't on the cancer drug prediction project but I probably know enough about it to reconstruct it. It actually seems like a great candidate for an open source project.
ExpiredLinkabout 13 years ago
The "the development of new therapies to treat disease" shouldn't be called "Academic Research". It's part of the pharmaceutical industry.
评论 #3814132 未加载
gtaniabout 13 years ago
Good analysis of providing source code, datasets and potential burdens on reviewers and authors.<p><a href="http://nlpers.blogspot.com/2011/03/some-thoughts-on-supplementary.html" rel="nofollow">http://nlpers.blogspot.com/2011/03/some-thoughts-on-suppleme...</a>