On the proper use of this data:<p>Perhaps the easiest way to think of it is that the phrases are predictors for the race/sex, not the other way around. For example, you shouldn't expect every white male you meet to like Van Halen. However if someone says to you "I have a friend who's a big Van Halen fan", you're pretty safe in assuming that the friend is a white male.<p>Likewise, it might be that only 10% of blacks like soul food. But if almost no other demographics like it, it will still show up high on their list. So "is black" does not strongly imply "loves soul food", but "loves soul food" does strongly imply "is black".<p>In other words, <a href="http://en.wikipedia.org/wiki/Bayes_theorem" rel="nofollow">http://en.wikipedia.org/wiki/Bayes_theorem</a>
There's just so much that they execute well on that I hate to pick any bit of it, but one thing everybody with linkbait should probably do is create something spiritually similar to the bar which pops up when you're done with the article. It is a force multiplier for all pillar content you write, it increases the viral factor, and the way it grabs someone's attention <i>just</i> when their brain is known to be vacant is sixteen flavors of brilliant.<p>I did something <i>very</i> similar for a client today, and after I get a little better at manipulating code to do it, I'm probably going to try something similar for getting trial signups. ("Looks like you're done reading about it. Feeling confused about what to do next? WHAM, signup box.")
Interesting read, a couple things come to mind though: how does "the people who have ok cupid profiles" vary against "the general population". Several things I suspect are skewed because of the population bias.<p>Also, they say most of their users are urban, but I'm curious if people aren't prone to list themselves as the nearest big city rather than where they really live. For instance, I suspect everyone within 45 minutes of Des Moines is listing themselves as living there, rather than the tiny farm town / suburb they really live in.
I found this, like most of the other blog posts by the OK Cupid team, pretty genius. I am glad the people who sit on top of this goldmine of social information have a good sense of humor.<p>It would be cool to see a statistician guest post. The OK Cupid people are great at coming up with ideas for analysis, but I'd love to see some solid stats behind some of their analyses.
It's interesting, I am asian, like soul food, but it's not something that would occur to me as putting on my profile. Similarly, I would write sashimi (if I ate it anymore) versus sushi, and I suspect that non-asians like sashimi just fine but wouldn't know to put it on... So the statistics point to self-cultural broadcasting, I think, more than preferences.
Those results are highly odd, but I don't think okcupid is mainstream enough to really glean any insight into racial psychology (if such a thing exists) from their data. Furthermore, they don't provide enough info on their analysis method, but I would be interested in seeing the results of a null-run: randomly assigning profiles to groups (rather than by race) and seeing what "statistically distinct" phrases arise (if the analysis is valid no phrases should arise).<p>It would also be interesting to see them do the same analysis for other features such as height, income, photo attractiveness, etc.<p>Similar analysis for craigslist personals by city: <a href="http://blog.kiwitobes.com/?p=42" rel="nofollow">http://blog.kiwitobes.com/?p=42</a>
I'm really surprised by how they didn't mention one of the most striking results from the data: Latinos on OKCupid are much more likely to have the word "stationed" in their profile than other demographics. Based on this, it looks like the military contains a large proportion of Latinos ("stationed in [location]"). What are the demographics of the military versus the general population?
Really interesting, but isn't this more about what white people want other people to think that they like, rather than what they actually like.<p>It would be interesting to compare this to what they actually like, but I have no idea how to get that data.
Eh their analysis method is not too hot. From the comments section:<p>"The phrases included in the black boxes are the top 50 phrases most statistically correlated to that group. We calculated this as follows:<p>1. We calculated the frequency of every 1, 2, and 3 word phrase for the whole population.
2. We calculated those same frequencies within each race/gender pair.
3. For each phrase, we divided #2 by #1.
4. This is the propensity of a given group to use a given phrase.
5. The list you see is the phrases with the 50 highest ratios of #2/#1."<p>So even if a group uses a phrase 1.001x more than the population average, it might still be listed, <i>if there are no actual phrase-usage differences</i> (i.e., all phrase ratios will be small, and the top 50 will be arbitrary).
I thought it was interesting how the largest countries aren't the most nationalistic - no Brazil, Mexico, China, Japan, etc.. I also came away with an identity crisis - #1 good food (Soul) and seeing Mos Def, Lupe Fiasco and Talib Kwali in the top "stuff"... dammit I might be black.<p>-
I doubt I'm alone here, but when confronted with the "insert fucking theory" I promptly went through the list of what white people like, inserting "fucking" anywhere I could...<p>"Groundhog Fucking Day" kind of left a bad taste in my mouth.
I am white and like none of those things other than guitar and software. I suspect the case is similar for most of the HN readership.<p>This should be called "what the lowest common denominator like"
It struck me as sort of funny that the minority each had a common self-description (cool, funny, simple), but the closest thing white guys have is "I'm a country boy".
That last stat about reading level bothers me. "Ok, before anyone gets offended about reading level vs race, let's show you a stat that confirms another stereotype: religious people are stupid! And atheists are smartest of all! Scientifically proven with a reading test based on the lengths of words, and metrics I just made up. And don't worry that almost half of the data points belie my analysis, ha ha ha, it confirms your prejudices, so it's ok!"