TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How big data has created a big crisis in science

44 点作者 kouzant超过 6 年前

7 条评论

x3tm超过 6 年前
Not sure what big data has to do with this. It is a problem that has always existed. Particularly in the &quot;soft&quot; sciences and biology. Its&#x27; certainly not the case in physics for instance.<p>Big data may bring another dimension to the problem when deep learning will be used in science. However, that&#x27;s a detail and we&#x27;re not there yet.<p>The fact that a paper is reproducible or not is not problematic per se. This is not what defines science. The real problem arises when 1&#x2F; a big claim&#x2F;discovery is made in a paper that is not reproducible, 2&#x2F; nobody tries to check the results independently, and 3&#x2F; the community takes nevertheless the paper seriously and accepts its findings. All this has nothing to do with the use of statistics (unless the whole community makes the exact same errors) or big data.
iso1337超过 6 年前
I don’t think the author has the causality correct here, at least for biosciences. The statistical problems existed long before omics, big data, etc.<p>Most of the graduate programs don’t require students to take statistics, or if they do, it’s very cursory. Furthermore, students often learn very little about assay design - they end up thinking that non-linear responses are linear and do things like divide assay signals to get ratios (two sins here: assuming the assay response intercepts at 0 and that it’s linear).<p>So at least for the biosciences, it’s been a shitshow for a while.
评论 #19012549 未加载
sgt101超过 6 年前
Data driven hypothesis have always been central to science, but the trick is that they are used to generate a theory which produces a prediction that&#x27;s not seen in the data (so far) that can then be tested with statistical methods.
评论 #19012745 未加载
rossdavidh超过 6 年前
For an article on proper use of statistics in science, this is rather short on data for an empirical test. For example, did studies from the pre-Big Data era (whenever you think that was) actually have a higher rate of reproducibility? If this has been demonstrated, I am not aware of it, and certainly we are not given a reference to such data in this article.
matchagaucho超过 6 年前
Seems like the failure is in paper editing and review... why are these &quot;findings&quot; getting published at all?
rafiki6超过 6 年前
I don&#x27;t see an issue with having a data collection and data engineering function that operates separately from those scientists who are creating hypotheses and then they can search data catalogs and libraries to serve their hypotheses no? It seems the author has mistaken the ability to collect and process data with running an experiment. Further, does cross validation not apply in the sciences? Does sample size not apply? In the apocryphal example given in the story, wouldn&#x27;t a study get tossed for if it used a sample size of 8 data points to begin with? And wouldn&#x27;t it be really really stupid for scientists attempting to reproduce the study to go and reuse the same 8 samples?
评论 #19012948 未加载
guscost超过 6 年前
Science Has Only Two Legs: <a href="https:&#x2F;&#x2F;m-cacm.acm.org&#x2F;magazines&#x2F;2010&#x2F;9&#x2F;98038-science-has-only-two-legs&#x2F;fulltext" rel="nofollow">https:&#x2F;&#x2F;m-cacm.acm.org&#x2F;magazines&#x2F;2010&#x2F;9&#x2F;98038-science-has-on...</a>