Really cool! Couple of comments:<p>1. I'm assuming you downloaded comment threads from the front page of each the subreddits you looked at and then looked at the subreddit each of the posters had commented in. How many requests did you end up making?<p>2. Did you hand select the subreddits you analysed? If so, what criteria were you looking for?<p>3. Have you thought about doing any more research into this area? I made <a href="http://redditgraphs.com/" rel="nofollow">http://redditgraphs.com/</a> and was looking into ways of guessing a user's age & gender based on their commenting history. I found some papers about similar sites:<p>twitter: <a href="http://www.aclweb.org/anthology-new/D/D11/D11-1120.pdf" rel="nofollow">http://www.aclweb.org/anthology-new/D/D11/D11-1120.pdf</a><p>blogspot: <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.136.9952&rep=rep1&type=pdf" rel="nofollow">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.136...</a><p>youtube: <a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/38143.pdf" rel="nofollow">http://static.googleusercontent.com/external_content/untrust...</a> (This one looks the most promising; using their methods, treat subreddits as youtube videos to create more accurate profiles of communities and users. They also examine the propagation of speech patterns which capture the spread of some memes.)<p>Unfortunately, reddit doesn't have user profiles or name-like user names (so there isn't an easily available training set) and I was having difficulties organizing and analyzing the large amount of data I was downloading, so I put the project aside. There has been basically no research done specific to reddit (<a href="http://scholar.google.com/scholar?as_ylo=2008&q=reddit+demographics&hl=en&as_sdt=0,14" rel="nofollow">http://scholar.google.com/scholar?as_ylo=2008&q=reddit+d...</a>) which is surprising to me because of its size and unique subreddit system.<p>4. If you want to examine the spread of memes, you need access to old threads. <a href="http://stattit.com/" rel="nofollow">http://stattit.com/</a> is the best way of getting around the reddit API's 1000 most recent post limitation.<p>5. Last month, a similar data set (which only looked at reddit) was collected - I think you're trying to do something different and your presention is much better, but you might be interested in the discussion: <a href="http://www.reddit.com/r/TheoryOfReddit/comments/126pth/scraped_110k_comments_from_45000_users_in_527/" rel="nofollow">http://www.reddit.com/r/TheoryOfReddit/comments/126pth/scrap...</a>