I tested a lot of these services and libraries a while ago as part of developing a product that required extracting article text and metadata from a URL.<p>The best service, and it won by some margin, was Diffbot (www.diffbot.com). I ran comparisons between approx 20 different services and libraries and it won by some margin. It uses machine learning rather than regular expressions or per-site filters, and the engine has been extensively trained (I threw a lot of edge cases at it, which improved it). There seem to be a lot of similar services that do well with common cases but completely fall apart when applied broadly.<p>So to the author of this service - what features or examples do you have that distinguish your implementation from others? What is the technique being used here?
Looks nice, we should talk, I run a service that does the same (and more): <a href="http://www.feedsapi.com" rel="nofollow">http://www.feedsapi.com</a> , where are you based in Switzerland, I was in Bienne a couple of months ago and based in Germany. I will drop you a mail shortly.
Any chance you can expand this as a "real" service, i.e. one with a guaranteed service level for a monthly fee?<p>I would love to use this in an iPhone app I am building, but I am obviously wary as it may disappear/go offline at any point.<p>I would gladly pay a monthly subscription to use it.
I tried this on <a href="https://www.xkcd.com/386/" rel="nofollow">https://www.xkcd.com/386/</a> , but <a href="http://api.thequeue.org/v1/clear?url=https://www.xkcd.com/386/" rel="nofollow">http://api.thequeue.org/v1/clear?url=https://www.xkcd.com/38...</a> just extracted the content disclaimer and Creative Commons license notice at the bottom of the page: "Warning: this comic contains [...] This work is licensed under [...]".
This is great. I made a personal periodical for myself using readability and it worked, but was a pain in the ass. This is exactly what I should've built first.