Something that I found truly shocking is that, in the paper's data, the number of straight men is exactly the same as the number of gay men- and the same for women (for individuals with at least one picture, i.e. everyone; numbers for those with more than one picture are different).<p>The paper itself cites a 7% distribution of gay men in the general population. Yet they trained with a 50/50, uniform distribution. But- why?<p>Well- because a problem involving unbalanced classes, like gay/straight individuals, where your target class (gay men/women) is less than a tenth of your entire data is a bitch to train a classifier for. Now, if you artificially equalise the data, by just adding more of that class, you can get a pretty good "accuracy" score (precision over recall, which they used).<p>Except of course, that score is completely useless as an estimate of the true accuracy of your classifier in the real world, against the true distribution of your data, "in the wild". It's also completely useless as evidence for whatever hare-brained theory you want to posit, that involves, oh, say, the distribution of feminine and masculine features in gay and straight individuals' faces - you know, the point the paper was making.<p>This should be a cautionary tale: you can't just force a classifier to give you the results you want it to and then claim that those results prove your theory. That's just bad machine learning. Like bad statistics, but with more assumptions.