TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An open source API for web scraping

19 pointsby owainlewisalmost 10 years ago

6 comments

owainlewisalmost 10 years ago
An example showing how to grab all the stories from the Hacker News homepage<p><a href="https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=https:&#x2F;&#x2F;news.ycombinator.com&#x2F;news&amp;q=td.title%20a" rel="nofollow">https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=https:&#x2F;&#x2F;news....</a>
_jomoalmost 10 years ago
Title should probably contain &#x27;Show HN:&#x27; ?<p>Very interesting though. Just tried scraping twitter and it works great: <a href="https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=https:&#x2F;&#x2F;twitter.com&#x2F;shit_hn_says&amp;q=.tweet-text" rel="nofollow">https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=https:&#x2F;&#x2F;twitt...</a><p>Edit: works great as long as there are no quotes, hashtags, or links in the tweets. Is it possible to include sub-elements?<p>So basically this is a DOM API in JSON. Simple, but I like it.<p>Any plans to add JSONP support?
评论 #9774068 未加载
getriveralmost 10 years ago
A better error message would be helpful. For example I tried to do: <a href="https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=https:&#x2F;&#x2F;koding.com&#x2F;Activity&#x2F;Public&#x2F;Liked&amp;q=a" rel="nofollow">https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=https:&#x2F;&#x2F;kodin...</a>, all I got was &quot;Request failed&quot;
评论 #9776496 未加载
Jake232almost 10 years ago
Cool idea. This could easily be extended to support something like a proxy pool; that way you can rate limit &#x2F; rotate proxies for X domain globally at this server level. That way it&#x27;s across all your projects, rather than having to do it on a per project basis.<p>Adding xPath support as well as CSS selectors would be a good addition.
评论 #9774100 未加载
owainlewisalmost 10 years ago
An example query that extracts all the images from the Digg.com homepage.<p><a href="https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=http:&#x2F;&#x2F;digg.com&amp;q=img[src]" rel="nofollow">https:&#x2F;&#x2F;falkor-api.herokuapp.com&#x2F;api&#x2F;query?url=http:&#x2F;&#x2F;digg.c...</a>
curiouslyalmost 10 years ago
Pretty interesting. Wrote a web scraping api you can paste in to your browser and download results last year but took it down to work on another project. You can take look at what a url could look like.<p><a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20140420162639&#x2F;http:&#x2F;&#x2F;scrape.ly&#x2F;" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20140420162639&#x2F;http:&#x2F;&#x2F;scrape.ly&#x2F;</a><p>For example if you wanted the profile of authors of today&#x27;s stories<p><pre><code> http:&#x2F;&#x2F;scrape.ly&#x2F;s&#x2F;{http:&#x2F;&#x2F;news.combination.com} {&#x27;ueoma87&#x27;}*{&#x27;next&#x27;:&#x27;Next Page&#x27;}{&#x27;karma&#x27;:&#x27;331&#x27;, &#x27;username&#x27;:&#x27;ueoma87&#x27;} </code></pre> Would&#x27;ve returned all the profiles of each story&#x27;s author today and yesterday and so on.
评论 #9776486 未加载