TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Diffbot launches a web page classifier API: analyzes a day of Twitter

45 pointsby miketalmost 13 years ago

7 comments

miketalmost 13 years ago
Wanted to thank the HN community for all your encouragement. I first released the Diffbot API as a "Show HN:" post last year (<a href="http://news.ycombinator.com/item?id=2310852" rel="nofollow">http://news.ycombinator.com/item?id=2310852</a>). $2M+ and lots of hard work later, we're powering some of the largest destination sites out there like Stumbleupon and the new Digg.
评论 #4393528 未加载
评论 #4393074 未加载
ig1almost 13 years ago
Conceptually I like the product, it's something I would consider paying for. But in practice it doesn't seem to perform that well. It misclassify things it should get right (article hosted on posterous; a youtube page; hacker news) and for some queries it just returns results for a completely different webpage.<p>The page tagging technology looks good though.
评论 #4393679 未加载
dave_sullivanalmost 13 years ago
Really like the vision approach to classifying web pages, I've been thinking google should add this to their algo for a while (if they havent already).<p>Classifying individual parts of pages (as Diffbot seems to be doing) is difficult, but I suspect google could take screenshots of pages reported as spam or whatever as one class and compare those to screenshots of pages w/high pr to get a pretty interesting classifier they could use as an extra datapoint. Could be an interesting experiment anyway, using data they've got lying around.
jdangualmost 13 years ago
I see some potential in ad tech.<p>How does caching works? Is there any focus on security? Multiple geolocations?<p>I liked the TOS :) ---- Diffbot.com is made available for personal, non-commercial, and commercial purposes. Services are provided as-is, and we do not make any guarantees on the quality or performance.
jdhuangalmost 13 years ago
I thought this was super-clever when I first came across DiffBot last year. Can't wait to see what they come out with next.<p>Keep it up!
laserDinosauralmost 13 years ago
wow, pretty cool. I wonder though is there much use for it outside of aggregate sites like digg? Even for a site like reddit, all the content is already split up into categories by users. While this is really cool, I'm not really seeing much use for it. What are some problems that this will solve?
评论 #4392960 未加载
meliponealmost 13 years ago
what are the different types the page classifier returns?