TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Twenty-nine teams use same dataset, find contradicting results [pdf]

172 pointsby alexleavittabout 9 years ago

9 comments

enteeabout 9 years ago
This paper is awesome because it transparently folds the analytical approach into the experiment being conducted.<p>There are two kinds of scientific study: those where you can run another (ideally orthogonally approaching to the same question) experiment along with rigorous controls, and those where you can&#x27;t.<p>The first type is much less likely to have results vary based on analytical technique (effectively the second experiment is a new analytical technique). Of course it does happen sometimes and sometimes the studies are wrong, still more controls and more experiments are always more better.<p>However, studies were you&#x27;re limited by ethical or practical constraints (i.e. most experiments involving humans) don&#x27;t have that luxury and therefore are far more contingent on decisions made at the analysis stage. What&#x27;s awesome with this paper is it kind of gets around this limitation by trying different analytical methods, effectively each being a new &quot;experiment&quot; and seeing if they all reach the same consensus.<p>Interestingly, very few features in the analysis were shared among a large fraction of the teams, (only 2 features were used by more than 50% of teams) which suggests that no matter the method, the result holds true. A similar approach to open data and distributed analysis would be a really great way to eliminate some of the recent trouble with reproducibility in the broader scientific literature.
dangabout 9 years ago
A blog post giving background is at <a href="http:&#x2F;&#x2F;www.nature.com&#x2F;news&#x2F;crowdsourced-research-many-hands-make-tight-work-1.18508" rel="nofollow">http:&#x2F;&#x2F;www.nature.com&#x2F;news&#x2F;crowdsourced-research-many-hands-...</a>.
评论 #11293343 未加载
SilasXabout 9 years ago
Reminds me of the idea (Robin Hanson&#x27;s, I think?) to add an extra layer of blindness to studies: during peer review, take the original data, and write a separate paper with the opposite conclusion. Randomize which reviewers get which version. Your original paper is then only accepted if they reject the inverted version.
评论 #11294904 未加载
sndeanabout 9 years ago
FiveThirtyEight did a write up of this paper (part 2):<p><a href="http:&#x2F;&#x2F;fivethirtyeight.com&#x2F;features&#x2F;science-isnt-broken&#x2F;" rel="nofollow">http:&#x2F;&#x2F;fivethirtyeight.com&#x2F;features&#x2F;science-isnt-broken&#x2F;</a><p>On the bright side, if you look at the 95CI for the 29 studies, almost all of them overlap.
joe_the_userabout 9 years ago
<i>&quot;The primary research question tested in the crowdsourced project was whether soccer players with dark skin tone are more likely than light skin toned players to receive red cards from referees.&quot;</i><p>This seems like a topic where one indeed typically winds-up with a multitude of competing conclusions.<p>Among other factors for we have:<p>* Pre-existing beliefs on the part of researchers.<p>* Lack of sufficient data.<p>* Difficulty in defining hypothese (is there a skin tone cut-off or should one look for degrees of skin tone and degrees of prejudice, should one look all referees or some referees).<p>Given this, I&#x27;d say it&#x27;s a mistake to expect <i>just numeric data</i> at the level of complex social interactions to be anything like clear or unambiguous. If studies on topics such as this have value, they have to involve careful arguments concerning data collection, data normalization&#x2F;massaging, and <i>only then</i> data analysis and conclusions.<p>But a lot of the context comes from prevalence shoddy studies that expect you can throw data in a bucket and draw conclusions, further facilitated having those conclusions echoed by mainstream media or by the media of one&#x27;s chosen ideology.
krickabout 9 years ago
I understand how tempting it is in our age of big data and all that stuff to perceive this as some curious new phenomena, but it really is not. This is precisely the reason why we&#x27;ve come up with some criteria for &quot;science&quot; quite a while ago. And in fact, all this experiment is pretty meaningless.<p>So, for starters: 29 students get the same question on the math&#x2F;physics&#x2F;chemistry exam and give 29 different answers. Breaking news? Obviously not. Either the question was outrageously bad worded (not such a rare thing, sadly), or students didn&#x27;t do very well and we&#x27;ve got at most 1 <i>correct</i> answer.<p>Basically, we&#x27;ve got the very same situation here. Except our &quot;students&quot; were doing statistics, which is not really math and not really natural science. Which is why it is somehow &quot;acceptable&quot; to end up with the results like that.<p>If we are doing math, whatever result we get must be backed up with formally correct proof. Which doesn&#x27;t mean of course, that 2 good students cannot get contradicting results, but at least one of their proofs is faulty, which can be shown. And this is how we decide what&#x27;s &quot;correct&quot;.<p>If we are doing science (e.g. physics) our question must be formulated in a such way that it is verifiable by setting up an experiment. If experiment didn&#x27;t get us what we expected — our theory is wrong. If it did — it <i>might</i> be correct.<p>Here, our original question was &quot;if players with dark skin tone are more likely than light skin toned players to receive red cards from referees&quot;, which is shit, and not a scientific hypothesis. We can define &quot;more likely&quot; as we want. What we really want to know: if during next N matches happening in what we can consider &quot;the same environment&quot; black athletes are going to get more red cards than white athletes. Which is quite obviously a bad idea for a study, because the number of trials we need is too big for so loosely defined setting: not even 1 game will actually happen in isolated environment, players will be different, referees will be different, each game will change the &quot;state&quot; of our world. Somebody might even say that the whole culture has changed since we started the experiment, so obviously whatever the first dataset was — it&#x27;s no longer relevant.<p>Statistics is only a tool, not a &quot;science&quot;, as some people might (incorrectly) assume. It is not the fault of methods we apply that we get something like that, but rather the discipline that we apply them to. And &quot;results&quot; like that is why physics is accepted as a science, and sociology never really was.
评论 #11295432 未加载
评论 #11294894 未加载
评论 #11294625 未加载
评论 #11295730 未加载
评论 #11294626 未加载
DiabloD3about 9 years ago
So, does this mean every team used improper methodology? Or can we meta-review the results and figure out what&#x27;s really going on?
评论 #11296076 未加载
LunaSeaabout 9 years ago
Of course it&#x27;s social &quot;sciences&quot;.
评论 #11296085 未加载
hmate9about 9 years ago
Lies, damned lies, and statistics<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Lies,_damned_lies,_and_statistics" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Lies,_damned_lies,_and_statist...</a><p>Statistics can be manipulated surprisingly easily.
评论 #11293824 未加载