TechEcho

19 comments

nikcubover 13 years ago

I tested a lot of these services and libraries a while ago as part of developing a product that required extracting article text and metadata from a URL.<p>The best service, and it won by some margin, was Diffbot (www.diffbot.com). I ran comparisons between approx 20 different services and libraries and it won by some margin. It uses machine learning rather than regular expressions or per-site filters, and the engine has been extensively trained (I threw a lot of edge cases at it, which improved it). There seem to be a lot of similar services that do well with common cases but completely fall apart when applied broadly.<p>So to the author of this service - what features or examples do you have that distinguish your implementation from others? What is the technique being used here?

评论 #3647568 未加载

评论 #3648987 未加载

Concoursover 13 years ago

Looks nice, we should talk, I run a service that does the same (and more): <a href="http://www.feedsapi.com" rel="nofollow">http://www.feedsapi.com</a> , where are you based in Switzerland, I was in Bienne a couple of months ago and based in Germany. I will drop you a mail shortly.

andysinclairover 13 years ago

Any chance you can expand this as a "real" service, i.e. one with a guaranteed service level for a monthly fee?<p>I would love to use this in an iPhone app I am building, but I am obviously wary as it may disappear/go offline at any point.<p>I would gladly pay a monthly subscription to use it.

评论 #3647720 未加载

dabeeeensterover 13 years ago

What text extractor engine are you using?

评论 #3647276 未加载

JoshTriplettover 13 years ago

I tried this on <a href="https://www.xkcd.com/386/" rel="nofollow">https://www.xkcd.com/386/</a> , but <a href="http://api.thequeue.org/v1/clear?url=https://www.xkcd.com/386/" rel="nofollow">http://api.thequeue.org/v1/clear?url=https://www.xkcd.com/38...</a> just extracted the content disclaimer and Creative Commons license notice at the bottom of the page: "Warning: this comic contains [...] This work is licensed under [...]".

评论 #3647107 未加载

lowglowover 13 years ago

Sweet. We should talk. I run a similar project at <a href="http://www.rtcool.com/" rel="nofollow">http://www.rtcool.com/</a>

johncoltraneover 13 years ago

Thanks. You should avoid underlined text for non-links, though.

neiljohnsonover 13 years ago

It would be really great if, for shortened links, it also provided the final url

评论 #3647425 未加载

endlessvoid94over 13 years ago

This is great. I made a personal periodical for myself using readability and it worked, but was a pain in the ass. This is exactly what I should've built first.

digamber_kamatover 13 years ago

Thanks a lot. It needs to improve a bit I guess but a great beginning, always wanted such an API.

einarloveover 13 years ago

Just what ive been looking for! Will definitively use it sooner or later.

sidolinover 13 years ago

You might want to stop it from opening local files.

评论 #3647108 未加载

gillesguilleminover 13 years ago

Thanks man, you quite likely made my day!

n8jiover 13 years ago

any chance you'll add JSON support?

评论 #3646989 未加载

评论 #3646987 未加载

ale55androover 13 years ago

too awesome! I like it and it couldn't have come at a better time. Made my day as well :)

dragosstancuover 13 years ago

Very cool. I could totally use it in a future iPhone app. Pinterest fever alert! :)

balsamiqover 13 years ago

Hey thanks for using my post as an example! ;)

TamDenholmover 13 years ago

I'm glad there is JSON support. Does anyone else think XML should die a painful death?

robmcmover 13 years ago

Very cool, good work :D

19 comments

nikcubover 13 years ago

评论 #3647568 未加载

评论 #3648987 未加载

Concoursover 13 years ago

andysinclairover 13 years ago

评论 #3647720 未加载

dabeeeensterover 13 years ago

What text extractor engine are you using?

评论 #3647276 未加载

JoshTriplettover 13 years ago

评论 #3647107 未加载

lowglowover 13 years ago

Sweet. We should talk. I run a similar project at <a href="http://www.rtcool.com/" rel="nofollow">http://www.rtcool.com/</a>

johncoltraneover 13 years ago

Thanks. You should avoid underlined text for non-links, though.

neiljohnsonover 13 years ago

It would be really great if, for shortened links, it also provided the final url

评论 #3647425 未加载

endlessvoid94over 13 years ago

This is great. I made a personal periodical for myself using readability and it worked, but was a pain in the ass. This is exactly what I should've built first.

digamber_kamatover 13 years ago

Thanks a lot. It needs to improve a bit I guess but a great beginning, always wanted such an API.

einarloveover 13 years ago

Just what ive been looking for! Will definitively use it sooner or later.

sidolinover 13 years ago

You might want to stop it from opening local files.

评论 #3647108 未加载

gillesguilleminover 13 years ago

Thanks man, you quite likely made my day!

n8jiover 13 years ago

any chance you'll add JSON support?

评论 #3646989 未加载

评论 #3646987 未加载

ale55androover 13 years ago

too awesome! I like it and it couldn't have come at a better time. Made my day as well :)

dragosstancuover 13 years ago

Very cool. I could totally use it in a future iPhone app. Pinterest fever alert! :)

balsamiqover 13 years ago

Hey thanks for using my post as an example! ;)

TamDenholmover 13 years ago

I'm glad there is JSON support. Does anyone else think XML should die a painful death?

robmcmover 13 years ago

Very cool, good work :D

Like Instapaper, but for Developers

19 comments

Like Instapaper, but for Developers

19 comments