<i>> Seventy-three independent research teams used identical cross-country survey data to test an established social science hypothesis: that more immigration will reduce public support for government provision of social policies. Instead of convergence, teams’ numerical results varied greatly, ranging from large negative to large positive effects of immigration on public support. The choices made by the research teams in designing their statistical tests explain very little of this variation: a hidden universe of uncertainty remains.</i><p>That's... surprising, but keep in mind that we're talking about "soft" social sciences here.<p>Measuring soft quantities like "public support for government provision of social policies in response to increased immigration" is <i>very</i> different from measuring hard quantities like, say, the location of a rocket in space at a point in time using earth as a frame of reference.<p>I'm looking forward to seeing what Andrew Gelman and his colleagues will say about it at <a href="https://statmodeling.stat.columbia.edu/" rel="nofollow">https://statmodeling.stat.columbia.edu/</a> -- surely they will want to say something about it!
> Researchers’ expertise, prior beliefs, and expectations barely predict the wide variation in research outcomes. More than 90% of the total variance in numerical results remains unexplained even after accounting for research decisions [including which statistical tests to perform] identified via qualitative coding of each team’s workflow.<p>They're saying that scientists, given the same data, will achieve widely different results, and that they can't even work out how they're getting these different results. That's surprising and concerning.<p>> This reveals a universe of uncertainty that is otherwise hidden when considering a single study in isolation. The idiosyncratic nature of how researchers’ results and conclusions varied is a new explanation for why many scientific hypotheses remain contested.
<i>It calls for greater humility and clarity in reporting scientific findings.</i><p>Hmmm, I think the variance in results is maybe a bit more of an issue in the softer sciences and the conclusions would be better worded to reflect that considering the context here, rather than implying they hold to a similar extent over science generally.<p>There is no doubt human failings also challenge physics etc (see previous threads here) and humility and clarity is appropriate there also, but the significant problems highlighted in this study seem more particular to areas of research that have long been regarded as less sure due to the nature of their domain.
There are many problems with this kind of hypotheses:
(a) any hypothesis should predict/explain facts that are different from the original data/facts that the hypothesis accounts for; otherwise, it is ad hoc.
(b) multiple competing hypotheses can account for the same set of facts. So, how to pick which hypothesis should one pick from the competing hypotheses? Again, here, Philosophy of sciences come to rescue. Pick a hypothesis that explains/solves more facts/problems. Here, the problems/facts should be novel, not the original ones.
I think this effect has been recognized in science for decades, even if it wasn't talked about much in print.<p>Recently I thought about trying to combat this effect myself by using multiple different analysis methods. For example, in my case I'm thinking about trying multiple different regression approaches. My research is in the physical sciences, so I don't think in this case the overall conclusions will be that different because the general trend is clear from a scatter plot (often not true in social science...), but it'll still be an interesting exercise.
There is another paper by the same author about a similar phenomenon: "Secondary observer effects: idiosyncratic errors in small-N secondary data analysis" - it's available from Sci-Hub.
Remember the saying figures don't lie but liars figure? I'm not saying any of these researchers had any malfeasance, but imagine if they did? You can see how easy it is to use "analysis" to say anything you want. The key takeaway is don't don't just look at the results of the analysis, look at the data and how they analyzed it.
Should be careful to differentiate between "Social Science" and Hard Science. One of these is flooded with questionable papers that cannot be replicated and the other has fields where a p value of 3×10^-7 is required for a discovery to even be taken seriously.
I think the timing of articles like this, the recent alarms over 'reproducibility' and so on... are a sign that we finally have the tools and volume of data to prove that humans are not good with data.
Most subjects that have the word "science" in the name aren't science: social science, computer science, political science, etc.<p>While the real sciences, don't have it: physics, chemistry, biology, etc.