TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

HN2JSON: A ruby gem for HackerNews

43 pointsby jcla1over 12 years ago

8 comments

dfcover 12 years ago
Be careful not to hammer the site. Your IP could be added to the blocklist if you are too aggressive:<p><i>"Yes, we block IPs that seem to be crawlers ignoring robots.txt. We've always blocked abusive IPs, but I tightened up the blocking a few weeks ago. A lot of people were crawling HN, most of them unnecessarily because they were doing things they could have done more efficiently through HNSearch's API[1]." --pg</i>[2]<p>[1] <a href="http://www.hnsearch.com/api" rel="nofollow">http://www.hnsearch.com/api</a><p>[2] <a href="http://news.ycombinator.com/item?id=3196298" rel="nofollow">http://news.ycombinator.com/item?id=3196298</a>
mmackhover 12 years ago
I've written a script that extracts HN, which anyone is welcome to use. I use it for the Hacker News iPhone app:<p><a href="http://api.thequeue.org/hn/frontpage.xml" rel="nofollow">http://api.thequeue.org/hn/frontpage.xml</a><p><a href="http://api.thequeue.org/hn/new.xml" rel="nofollow">http://api.thequeue.org/hn/new.xml</a><p><a href="http://api.thequeue.org/hn/best.xml" rel="nofollow">http://api.thequeue.org/hn/best.xml</a>
markburnsover 12 years ago
item = HN2JSON.find 4623690<p>NoMethodError: undefined method `url=' for #&#60;HN2JSON::Entity:0x007fb84cd63a88&#62;<p>from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/parser.rb:92:in `block in get_attrs_post' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/entity.rb:92:in `add_attrs' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/parser.rb:91:in `get_attrs_post' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/entity.rb:71:in `get_attrs' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json/entity.rb:56:in `initialize' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json.rb:35:in `new' from /Users/markburns/.rvm/gems/ruby-1.9.3-p194/gems/hn2json-0.0.4/lib/hn2json.rb:35:in `find'
评论 #4623827 未加载
rdudekulover 12 years ago
Going through the code on github to see how a HN page is parsed, was informative. I may use this to create one using Node.js. My interest is in building an intelligent agent that filters content based on my interests (example: coding, customer acquisition, hiring etc.) and notifies me on a daily or weekly basis.
评论 #4624007 未加载
selvanover 12 years ago
Checkout apify - <a href="http://apify.heroku.com/resources" rel="nofollow">http://apify.heroku.com/resources</a> &#38; scrapify - <a href="https://github.com/sathish316/scrapify" rel="nofollow">https://github.com/sathish316/scrapify</a> Library to scrap HTML content as JSON data.
mvanveenover 12 years ago
I wrote a small, ScraPy based HN crawler available at <a href="http://github.com/mvanveen/hncrawl" rel="nofollow">http://github.com/mvanveen/hncrawl</a> in case anyone is interested.
qmacroover 12 years ago
Excellent! I know I'm biased but I also know you've put a lot of effort into this. Well done Joseph.
why-elover 12 years ago
Nice work. Does Cronic have to be a runtime dependency?
评论 #4624037 未加载