TechEcho

6 comments

owainlewisalmost 10 years ago

An example showing how to grab all the stories from the Hacker News homepage<a href="https://falkor-api.herokuapp.com/api/query?url=https://news.ycombinator.com/news&q=td.title%20a" rel="nofollow">https://falkor-api.herokuapp.com/api/query?url=https://news....</a>

_jomoalmost 10 years ago

Title should probably contain 'Show HN:' ?Very interesting though. Just tried scraping twitter and it works great: <a href="https://falkor-api.herokuapp.com/api/query?url=https://twitter.com/shit_hn_says&q=.tweet-text" rel="nofollow">https://falkor-api.herokuapp.com/api/query?url=https://twitt...</a>Edit: works great as long as there are no quotes, hashtags, or links in the tweets. Is it possible to include sub-elements?So basically this is a DOM API in JSON. Simple, but I like it.Any plans to add JSONP support?

评论 #9774068 未加载

getriveralmost 10 years ago

A better error message would be helpful. For example I tried to do: <a href="https://falkor-api.herokuapp.com/api/query?url=https://koding.com/Activity/Public/Liked&q=a" rel="nofollow">https://falkor-api.herokuapp.com/api/query?url=https://kodin...</a>, all I got was "Request failed"

评论 #9776496 未加载

Jake232almost 10 years ago

Cool idea. This could easily be extended to support something like a proxy pool; that way you can rate limit / rotate proxies for X domain globally at this server level. That way it's across all your projects, rather than having to do it on a per project basis.Adding xPath support as well as CSS selectors would be a good addition.

评论 #9774100 未加载

owainlewisalmost 10 years ago

An example query that extracts all the images from the Digg.com homepage.<a href="https://falkor-api.herokuapp.com/api/query?url=http://digg.com&q=img[src]" rel="nofollow">https://falkor-api.herokuapp.com/api/query?url=http://digg.c...</a>

curiouslyalmost 10 years ago

Pretty interesting. Wrote a web scraping api you can paste in to your browser and download results last year but took it down to work on another project. You can take look at what a url could look like.<a href="https://web.archive.org/web/20140420162639/http://scrape.ly/" rel="nofollow">https://web.archive.org/web/20140420162639/http://scrape.ly/</a>For example if you wanted the profile of authors of today's stories<pre><code> http://scrape.ly/s/{http://news.combination.com} {'ueoma87'}*{'next':'Next Page'}{'karma':'331', 'username':'ueoma87'} </code></pre> Would've returned all the profiles of each story's author today and yesterday and so on.

An open source API for web scraping

6 comments

An open source API for web scraping

6 comments