The robots.txt from news.ycombinator.com reads as follows:<p><pre><code> User-Agent: *
Disallow: /x?
Disallow: /vote?
Disallow: /reply?
Disallow: /submitted?
Disallow: /submitlink?
Disallow: /threads?
Crawl-delay: 30
</code></pre>
So nominally you should feel free to set up a scraper that crawls one non-disallowed resource every 30 seconds.
Just use <a href="https://www.hnsearch.com" rel="nofollow">https://www.hnsearch.com</a>, along with <a href="https://www.hnsearch.com/rss" rel="nofollow">https://www.hnsearch.com/rss</a> and <a href="https://www.hnsearch.com/bigrss" rel="nofollow">https://www.hnsearch.com/bigrss</a> if you want to mimic the front page.<p>There is rarely a need to scrape HN directly, but if you do make sure your bot is polite (especially with respect to rate limits).
Yahoo pipes would work really well if you're willing to write a few HTML regexes or dom element selectors.<p><a href="http://pipes.yahoo.com/pipes/" rel="nofollow">http://pipes.yahoo.com/pipes/</a>
Not a full featured api, but a way to scrape all of HN:
<a href="http://jcla1.com/blog/2013/05/13/crawling-hackernews/" rel="nofollow">http://jcla1.com/blog/2013/05/13/crawling-hackernews/</a><p>Disclaimer: It's my own blog<p>edit: Uses HNSearch, so it doesn't violate the robots.txt and can be crawled faster
You don't even need an api, all you need is an rss reader and read - <a href="https://news.ycombinator.com/rss" rel="nofollow">https://news.ycombinator.com/rss</a>
I wrote an alright one in Python for use in my HN app for BlackBerry 10. Not sure how good it is, but check it out here: <a href="https://github.com/krruzic/Reader-YC/tree/master/app" rel="nofollow">https://github.com/krruzic/Reader-YC/tree/master/app</a><p>I'm not sure what you're trying to do though. I used beautifulsoup because I couldn't get lxml working on BB10, but if it was switched to using lxml it would be much faster.
<a href="http://hnapp.com/" rel="nofollow">http://hnapp.com/</a> -- This is the best HN Scraped site.. returns data in JSON / RSS format.
Depending on what you're trying to do with the data, you may find <a href="http://diffbot.com/products/automatic/" rel="nofollow">http://diffbot.com/products/automatic/</a> helpful for getting the clean article text and categorization in JSON format. It can be used as a complement/augmentation to the great suggestions here for getting the links.<p>Disclosure: Founder of Diffbot here.
I wrote a Python wrapper for the iHackerNews API, if that helps.<p><a href="https://github.com/dmpayton/python-ihackernews" rel="nofollow">https://github.com/dmpayton/python-ihackernews</a>
There's a twitter feed based on HN -
<a href="https://twitter.com/newsycombinator" rel="nofollow">https://twitter.com/newsycombinator</a><p>You can use the twitter API and read from there
I have a ScraPy-based crawler project available at <a href="http://github.com/mvanveen/hncrawl" rel="nofollow">http://github.com/mvanveen/hncrawl</a>
can anyone say me how to get <a href="https://news.ycombinator.com/news" rel="nofollow">https://news.ycombinator.com/news</a> through hnsearch api .
I want the api link -> [<a href="http://api.thriftdb.com/api.hnsearch.com/" rel="nofollow">http://api.thriftdb.com/api.hnsearch.com/</a>] !!
I wrote <a href="http://scrape.it" rel="nofollow">http://scrape.it</a> and <a href="http://scrape.ly" rel="nofollow">http://scrape.ly</a> to do this.