TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Hacking News Aggregation with Smart Tagging

1 点作者 samikc大约 13 年前
How can I get all news on a particular topic in one place? It’s not possible at all. You have to go to five or ten different sites to get news on something. Say you want to know what’s hot today on startup. You may visit Hacker News, TechCrunch and the Next Web to get information on them. You do that daily. But say you are interested in JRuby or Ruby. Where would you go to get hot news on them? Yes you can be seeing content in sub-redits or you can go to a mailing list which talks about it. But you may miss out on news around Ruby in other websites.<p>So what’s the problem here? We do not have a website which allows us to get content from all other websites and categorize them. So how this web site should work? Here is an attempt from us to solve this very problem. We call it - “News Problem”.<p>This web site should have the following:<p>It should be an aggregator of various web sites. It should allow the user to follow one or more topics that he is interested in. it should somehow know how to categorize content from other website and then deliver it to the user.<p>This was the requirement that was there. And guess what two programmers would do about it? Yeah code the ass out to get a service like this.<p>So what we did from last summer?<p>We (Vinod and I) like to work with each other and we have built over time a micro blogging site - ScoopSpot. So we thought, let us use ScoopSpot as a platform to get the news problem solve. So we started off.<p>Problem #1: We need a list of web sites which we want to get news from.<p>We love to visit several web sites, like Hacker News, TechCrunch etc. So we actually have had a list. Also we felt that to keep our web site clean there should be a minimum editing required about web sites to crawl.<p>Problem #2: How can we get the content?<p>This problem is already solved with RSS feeds. So we built a RSS feed reader which would crawl the web sites periodically.<p>Problem #3: How do we define topic?<p>We needed a way to define the concept of topics in our web site. We choose to define it as tag. We added the functionality to follow a tag. We already had following people, so it was natural next move.<p>Problem #3: The big one – how the hell we know which content belongs where?<p>This where we have worked hard of late. We needed a system which can read an article and automatically tags them. We had to build a system which understands English, also a battery of statistical methods were built to get to know what the tags in an article are. We have got some success with this approach – we are not saying that we have solved it completely, there are still work going on – to reduce the false positives. So we now have a system which gives us news about a lot of favorite topic to us. Here are some links with tag names:<p>Ruby: http://www.scoopspot.com/ruby<p>Startup: http://www.scoopspot.com/startup<p>Music: http://www.scoopspot.com/music<p>If you would like to try out ScoopSpot please login with your Google/Faceboo/Yahoo id. We are looking forward to your feedback.<p>Thanks for reading.

暂无评论

暂无评论