TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

What are the Hidden Communities of Reddit?

194 点作者 eli_awry超过 12 年前

15 条评论

DanBC超过 12 年前
I'm not sure what "hidden" means in the title. See, eg, (<a href="http://www.reddit.com/r/proana" rel="nofollow">http://www.reddit.com/r/proana</a>). There are a bunch of these closed groups.<p>The author's work seems really useful for detecting spam. There are some people / bots who post a lot of specialist content. They only ever post links to content on domains that pay when visitors click links. These domains have a lot of ads. There's no other interaction on the site.<p>_NOT SAFE FOR WORK_:<p>This user (<a href="http://www.reddit.com/user/walfa2" rel="nofollow">http://www.reddit.com/user/walfa2</a>) only posts content from sites which pay when viewers see the images. The domains have heavy ad content, with popups etc.<p>Here's an example domain:<p>(<a href="http://www.reddit.com/domain/img1.picfoco.com/" rel="nofollow">http://www.reddit.com/domain/img1.picfoco.com/</a>)<p>Once you find one user you can find a bunch of these domains, and the other users posting to those domains, and thus find a few more domains.<p>With a bit of tinkering you could should a colour coded chart of spam domains; of users that only post content from those domains; and users that never make replies but only make top level comments.<p>That could be run once a week and (with human oversight) used to remove content which is not good for reddit.
评论 #4914878 未加载
评论 #4915542 未加载
NZ_Matt超过 12 年前
I vaguely remember several years back Reddit added the option for users to allow their subreddit and votes data to be used for research purposes with the hope of building a recommendation engine similar to this. Does anyone know if anything came from that? It would be great if the dataset was publicly available.<p>Edit: Here are the original threads, I don't think the project got very far. <a href="http://www.reddit.com/r/announcements/comments/ddz0s/reddit_wants_your_permission_to_use_your_data_for/" rel="nofollow">http://www.reddit.com/r/announcements/comments/ddz0s/reddit_...</a><p><a href="http://www.reddit.com/r/redditdev/comments/dtg4j/want_to_help_reddit_build_a_recommender_a_public/" rel="nofollow">http://www.reddit.com/r/redditdev/comments/dtg4j/want_to_hel...</a>
评论 #4915746 未加载
评论 #4914864 未加载
评论 #4914662 未加载
gurkendoktor超过 12 年前
OT - both Safari (w/o Flash) and Google Chrome max out all CPU cores as long as this site is open. The visualisation might need an upper limit on the work it is doing per second...
评论 #4914424 未加载
评论 #4915093 未加载
评论 #4915264 未加载
评论 #4914897 未加载
dmix超过 12 年前
I'd be curious to see the connection between politics/economics and other subreddits.<p>Such as what subreddits are /r/ liberals, conservatives, libertarians, anarchists, etc likely to follow?<p>Are liberals commonly in /r/trees? Are libertarians big on /r/economics? Are conservatives avoiding /r/wtf and /r/trees?
评论 #4914433 未加载
the_cat_kittles超过 12 年前
This is one only a handful of graphvis-esque visuals that ACTUALLY conveys information effectively, as far as I have seen. Not to mention it is really interesting info! Nice work!
razkul超过 12 年前
Awesome data. Really interesting to look at, and great presentation.<p>But there are a few things that kinda bother me with this:<p>The problem I can find with this data is that it isn't a representation of the reddit hidden communities as a whole, just the hidden communities of those who actually post (only 20% of Reddit).<p>A question I have is whether these are two-way connections with the groups. It's not clear exactly how the analysis is done 100% (perhaps I missed this portion), but could connections between subreddits be generated by there being a lot of people who post in a very tiny subreddit also posting in a larger subreddit? This means that though someone may like Large Subreddit A, they may not like the more specific Subreddit B. But a lot who like Subreddit B like Subreddit A.
评论 #4916877 未加载
msds超过 12 年前
I did a similar thing with all of the departments of the UW: <a href="http://www.sorens.in/posts/2012-8-11-uw-courses" rel="nofollow">http://www.sorens.in/posts/2012-8-11-uw-courses</a>
1wheel超过 12 年前
Really cool! Couple of comments:<p>1. I'm assuming you downloaded comment threads from the front page of each the subreddits you looked at and then looked at the subreddit each of the posters had commented in. How many requests did you end up making?<p>2. Did you hand select the subreddits you analysed? If so, what criteria were you looking for?<p>3. Have you thought about doing any more research into this area? I made <a href="http://redditgraphs.com/" rel="nofollow">http://redditgraphs.com/</a> and was looking into ways of guessing a user's age &#38; gender based on their commenting history. I found some papers about similar sites:<p>twitter: <a href="http://www.aclweb.org/anthology-new/D/D11/D11-1120.pdf" rel="nofollow">http://www.aclweb.org/anthology-new/D/D11/D11-1120.pdf</a><p>blogspot: <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.136.9952&#38;rep=rep1&#38;type=pdf" rel="nofollow">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.136...</a><p>youtube: <a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/38143.pdf" rel="nofollow">http://static.googleusercontent.com/external_content/untrust...</a> (This one looks the most promising; using their methods, treat subreddits as youtube videos to create more accurate profiles of communities and users. They also examine the propagation of speech patterns which capture the spread of some memes.)<p>Unfortunately, reddit doesn't have user profiles or name-like user names (so there isn't an easily available training set) and I was having difficulties organizing and analyzing the large amount of data I was downloading, so I put the project aside. There has been basically no research done specific to reddit (<a href="http://scholar.google.com/scholar?as_ylo=2008&#38;q=reddit+demographics&#38;hl=en&#38;as_sdt=0,14" rel="nofollow">http://scholar.google.com/scholar?as_ylo=2008&#38;q=reddit+d...</a>) which is surprising to me because of its size and unique subreddit system.<p>4. If you want to examine the spread of memes, you need access to old threads. <a href="http://stattit.com/" rel="nofollow">http://stattit.com/</a> is the best way of getting around the reddit API's 1000 most recent post limitation.<p>5. Last month, a similar data set (which only looked at reddit) was collected - I think you're trying to do something different and your presention is much better, but you might be interested in the discussion: <a href="http://www.reddit.com/r/TheoryOfReddit/comments/126pth/scraped_110k_comments_from_45000_users_in_527/" rel="nofollow">http://www.reddit.com/r/TheoryOfReddit/comments/126pth/scrap...</a>
评论 #4916942 未加载
评论 #4916976 未加载
Kluny超过 12 年前
Insanely fascinating. Keep working and adding more graphs and stuff. Everyone is going look for their favorite subreddit first, then see how common it is for members of that subreddit to be in to other things they are into.<p>For instance, I usually read /r/bicycles, but also programming, motorcycles, cars, and 2xc. How many other people have that unique mix of interests?
TGJ超过 12 年前
The bottom interactive graph is kinda neat. Setting zero friction and minimal spring tension and gravity center turns the whole thing into a spheroidal structure much like the accretion of objects in space.
toadi超过 12 年前
Good work for the visualization of the data. Take a look at <a href="http://www.datapointed.net/visualizations/" rel="nofollow">http://www.datapointed.net/visualizations/</a> his visuals are superb.
skadamat超过 12 年前
I go to UT and am on the FAI newsletter and totally get your emails!
rhizome超过 12 年前
Adrian Chen thanks you.
mahesh_rm超过 12 年前
Isn't r/WTF missing from this picture?
评论 #4914342 未加载
jrochkind1超过 12 年前
Hi Eli, neat work!