TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Twenty-nine teams use same dataset, find contradicting results [pdf]

172 点作者 alexleavitt大约 9 年前

9 条评论

entee大约 9 年前
This paper is awesome because it transparently folds the analytical approach into the experiment being conducted.<p>There are two kinds of scientific study: those where you can run another (ideally orthogonally approaching to the same question) experiment along with rigorous controls, and those where you can&#x27;t.<p>The first type is much less likely to have results vary based on analytical technique (effectively the second experiment is a new analytical technique). Of course it does happen sometimes and sometimes the studies are wrong, still more controls and more experiments are always more better.<p>However, studies were you&#x27;re limited by ethical or practical constraints (i.e. most experiments involving humans) don&#x27;t have that luxury and therefore are far more contingent on decisions made at the analysis stage. What&#x27;s awesome with this paper is it kind of gets around this limitation by trying different analytical methods, effectively each being a new &quot;experiment&quot; and seeing if they all reach the same consensus.<p>Interestingly, very few features in the analysis were shared among a large fraction of the teams, (only 2 features were used by more than 50% of teams) which suggests that no matter the method, the result holds true. A similar approach to open data and distributed analysis would be a really great way to eliminate some of the recent trouble with reproducibility in the broader scientific literature.
dang大约 9 年前
A blog post giving background is at <a href="http:&#x2F;&#x2F;www.nature.com&#x2F;news&#x2F;crowdsourced-research-many-hands-make-tight-work-1.18508" rel="nofollow">http:&#x2F;&#x2F;www.nature.com&#x2F;news&#x2F;crowdsourced-research-many-hands-...</a>.
评论 #11293343 未加载
SilasX大约 9 年前
Reminds me of the idea (Robin Hanson&#x27;s, I think?) to add an extra layer of blindness to studies: during peer review, take the original data, and write a separate paper with the opposite conclusion. Randomize which reviewers get which version. Your original paper is then only accepted if they reject the inverted version.
评论 #11294904 未加载
sndean大约 9 年前
FiveThirtyEight did a write up of this paper (part 2):<p><a href="http:&#x2F;&#x2F;fivethirtyeight.com&#x2F;features&#x2F;science-isnt-broken&#x2F;" rel="nofollow">http:&#x2F;&#x2F;fivethirtyeight.com&#x2F;features&#x2F;science-isnt-broken&#x2F;</a><p>On the bright side, if you look at the 95CI for the 29 studies, almost all of them overlap.
joe_the_user大约 9 年前
<i>&quot;The primary research question tested in the crowdsourced project was whether soccer players with dark skin tone are more likely than light skin toned players to receive red cards from referees.&quot;</i><p>This seems like a topic where one indeed typically winds-up with a multitude of competing conclusions.<p>Among other factors for we have:<p>* Pre-existing beliefs on the part of researchers.<p>* Lack of sufficient data.<p>* Difficulty in defining hypothese (is there a skin tone cut-off or should one look for degrees of skin tone and degrees of prejudice, should one look all referees or some referees).<p>Given this, I&#x27;d say it&#x27;s a mistake to expect <i>just numeric data</i> at the level of complex social interactions to be anything like clear or unambiguous. If studies on topics such as this have value, they have to involve careful arguments concerning data collection, data normalization&#x2F;massaging, and <i>only then</i> data analysis and conclusions.<p>But a lot of the context comes from prevalence shoddy studies that expect you can throw data in a bucket and draw conclusions, further facilitated having those conclusions echoed by mainstream media or by the media of one&#x27;s chosen ideology.
krick大约 9 年前
I understand how tempting it is in our age of big data and all that stuff to perceive this as some curious new phenomena, but it really is not. This is precisely the reason why we&#x27;ve come up with some criteria for &quot;science&quot; quite a while ago. And in fact, all this experiment is pretty meaningless.<p>So, for starters: 29 students get the same question on the math&#x2F;physics&#x2F;chemistry exam and give 29 different answers. Breaking news? Obviously not. Either the question was outrageously bad worded (not such a rare thing, sadly), or students didn&#x27;t do very well and we&#x27;ve got at most 1 <i>correct</i> answer.<p>Basically, we&#x27;ve got the very same situation here. Except our &quot;students&quot; were doing statistics, which is not really math and not really natural science. Which is why it is somehow &quot;acceptable&quot; to end up with the results like that.<p>If we are doing math, whatever result we get must be backed up with formally correct proof. Which doesn&#x27;t mean of course, that 2 good students cannot get contradicting results, but at least one of their proofs is faulty, which can be shown. And this is how we decide what&#x27;s &quot;correct&quot;.<p>If we are doing science (e.g. physics) our question must be formulated in a such way that it is verifiable by setting up an experiment. If experiment didn&#x27;t get us what we expected — our theory is wrong. If it did — it <i>might</i> be correct.<p>Here, our original question was &quot;if players with dark skin tone are more likely than light skin toned players to receive red cards from referees&quot;, which is shit, and not a scientific hypothesis. We can define &quot;more likely&quot; as we want. What we really want to know: if during next N matches happening in what we can consider &quot;the same environment&quot; black athletes are going to get more red cards than white athletes. Which is quite obviously a bad idea for a study, because the number of trials we need is too big for so loosely defined setting: not even 1 game will actually happen in isolated environment, players will be different, referees will be different, each game will change the &quot;state&quot; of our world. Somebody might even say that the whole culture has changed since we started the experiment, so obviously whatever the first dataset was — it&#x27;s no longer relevant.<p>Statistics is only a tool, not a &quot;science&quot;, as some people might (incorrectly) assume. It is not the fault of methods we apply that we get something like that, but rather the discipline that we apply them to. And &quot;results&quot; like that is why physics is accepted as a science, and sociology never really was.
评论 #11295432 未加载
评论 #11294894 未加载
评论 #11294625 未加载
评论 #11295730 未加载
评论 #11294626 未加载
DiabloD3大约 9 年前
So, does this mean every team used improper methodology? Or can we meta-review the results and figure out what&#x27;s really going on?
评论 #11296076 未加载
LunaSea大约 9 年前
Of course it&#x27;s social &quot;sciences&quot;.
评论 #11296085 未加载
hmate9大约 9 年前
Lies, damned lies, and statistics<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Lies,_damned_lies,_and_statistics" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Lies,_damned_lies,_and_statist...</a><p>Statistics can be manipulated surprisingly easily.
评论 #11293824 未加载