I understand how tempting it is in our age of big data and all that stuff to perceive this as some curious new phenomena, but it really is not. This is precisely the reason why we've come up with some criteria for "science" quite a while ago. And in fact, all this experiment is pretty meaningless.<p>So, for starters: 29 students get the same question on the math/physics/chemistry exam and give 29 different answers. Breaking news? Obviously not. Either the question was outrageously bad worded (not such a rare thing, sadly), or students didn't do very well and we've got at most 1 <i>correct</i> answer.<p>Basically, we've got the very same situation here. Except our "students" were doing statistics, which is not really math and not really natural science. Which is why it is somehow "acceptable" to end up with the results like that.<p>If we are doing math, whatever result we get must be backed up with formally correct proof. Which doesn't mean of course, that 2 good students cannot get contradicting results, but at least one of their proofs is faulty, which can be shown. And this is how we decide what's "correct".<p>If we are doing science (e.g. physics) our question must be formulated in a such way that it is verifiable by setting up an experiment. If experiment didn't get us what we expected — our theory is wrong. If it did — it <i>might</i> be correct.<p>Here, our original question was "if players with dark skin tone are more likely than light skin toned players to receive red cards from referees", which is shit, and not a scientific hypothesis. We can define "more likely" as we want. What we really want to know: if during next N matches happening in what we can consider "the same environment" black athletes are going to get more red cards than white athletes. Which is quite obviously a bad idea for a study, because the number of trials we need is too big for so loosely defined setting: not even 1 game will actually happen in isolated environment, players will be different, referees will be different, each game will change the "state" of our world. Somebody might even say that the whole culture has changed since we started the experiment, so obviously whatever the first dataset was — it's no longer relevant.<p>Statistics is only a tool, not a "science", as some people might (incorrectly) assume. It is not the fault of methods we apply that we get something like that, but rather the discipline that we apply them to. And "results" like that is why physics is accepted as a science, and sociology never really was.