HackerNews API: What if HN does not have API? Make API on the fly with APIfy

132 点作者 sathish316将近 13 年前

16 条评论

fizx将近 13 年前

Hah! tectonic and I applied to YC with almost exactly this in 2009?!We went as far as building a browser-based IDE-like environment for generating these, and a language called parsley for expressing the scrapes. If you're interested in this, you could check out some of our related open source libraries:Edit: I just open-sourced the scraping wiki site we created here: <a href="https://github.com/fizx/parselets_com" rel="nofollow">https://github.com/fizx/parselets_com</a><a href="http://selectorgadget.com" rel="nofollow">http://selectorgadget.com</a><a href="https://github.com/fizx/parsley" rel="nofollow">https://github.com/fizx/parsley</a><a href="https://github.com/fizx/parsley-ruby" rel="nofollow">https://github.com/fizx/parsley-ruby</a><a href="https://github.com/fizx/pyparsley" rel="nofollow">https://github.com/fizx/pyparsley</a><a href="https://github.com/fizx/csvget" rel="nofollow">https://github.com/fizx/csvget</a><pre><code> > cat hn.let { "headlines":[{ "title": ".title a", "link": ".title a @href", "comments": "match(.subtext a:nth-child(3), '\\d+')", "user": ".subtext a:nth-child(2)", "score": "match(.subtext span, '\\d+')", "time": "match(.subtext, '\\d+\\s+\\w+\\s+ago')" }] } > csvget --directory-prefix=./data -A "/x" -w 5 --parselet=hn.let http://news.ycombinator.com/ > head data/headlines.csv comments,title,time,link,score,user 4,Simpson's paradox: why mistrust seemingly simple statistics,2 hours ago,http://en.wikipedia.org/wiki/Simpson%27s_paradox,41,waldrews 67,America's unjust sex laws,2 hours ago,http://www.economist.com/opinion/displaystory.cfm?story_id=14165460,59,MikeCapone 23,Buy somebody lunch,3 hours ago,http://www.whattofix.com/blog/archives/2009/08/buy-somebody-lu.php,58,DanielBMarkham</code></pre>

评论 #4059550 未加载

评论 #4060474 未加载

评论 #4059578 未加载

pg将近 13 年前

HN does have an API: <a href="http://www.hnsearch.com/api" rel="nofollow">http://www.hnsearch.com/api</a>

评论 #4059948 未加载

评论 #4059399 未加载

评论 #4059740 未加载

评论 #4059463 未加载

sathish316将近 13 年前

Hacker News content is expired every 1 hour.Hacker New Newest links are also available here: <a href="http://apify.heroku.com/resources/4fca651b8526fe0001000002" rel="nofollow">http://apify.heroku.com/resources/4fca651b8526fe0001000002</a>Other APIs are never expired (Expire feature is still not pushed)

Jd将近 13 年前

My problem with HackerNews API (having done something like this -- the Hacker News Filter on Github) is that you get throttled after you hit a certain number of HTTP requests and your IP gets banned for a certain amount of time.So as nice as this is, it simply won't work here for the many people who would like to use near live data on HN.

评论 #4059184 未加载

DanielRibeiro将近 13 年前

We have had an HN API for a while now: <a href="http://api.ihackernews.com/" rel="nofollow">http://api.ihackernews.com/</a>

评论 #4059386 未加载

altano将近 13 年前

Can you add support for CORS (<a href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing" rel="nofollow">http://en.wikipedia.org/wiki/Cross-origin_resource_sharing</a>)?Can you add support for taking existing JSON API (rather than scraping HTML)? This useful for APIs that are neither accessible with CORS nor JSONP, APIs that are provided by incompetent mental midgets who don't answer emails or participate to their Google Group (cough MBTA cough).

评论 #4059323 未加载

6ren将近 13 年前

Seems to be fried. (Popularity is a good sign.)So, it's basically a web-scraper, but with a JSON API. The API input is limited to a single parameter, that indexes the record to be scraped. The API output is taken from that indexed record, consisting of a set of scraped elements within that record, and presented as JSON, with attributes named as user specified.Although this is limited to a list of renamed records, it could be extended (if needed), and I really like the concept and UI implementation. Feedback: As someone who has never used css, I found it very tricky to even duplicate the tutorial: selectors are sensitive to leading and trailing spaces; the selectors given in the tute aren't what's needed (and see BTW below); and often "API call failed: Internal Server Error" indicating a problem with the selector, but not what it is, and ATM service is often "unavailable" :), it's slow switching back and forth between "edit" and "test" (why not include testing on the same page? like HN comment edits: textarea + rendered result); when an attribute is removed, it remains in the JSON (code eg <a href="http://apify.heroku.com/resources/4fcb26d7a06a160001000024" rel="nofollow">http://apify.heroku.com/resources/4fcb26d7a06a160001000024</a>); it takes a long time (30s, 1min) to get a result. I hate to say it, but it's like my experience with ruby: it takes so much time and effort to get the tool to basically work, that I've used up all my enthusiasm/gumption and have none left for the project I had in mind. But much of this is because of current traffic spike, my ignorance of css, and minor polishing/bugs that can be fixed in vers 1.1 - as I said, I really like the idea and UI.But a deeper question: why a service, instead of a library? It's cross-language, but has an extra dependency (the service), an extra network jump, processing from many users convening at one point. It's interesting to me, because the world seems to be moving towards services, and this would logically include components that formerly would be libraries. Will this happen? What are the pros and cons? Will Amazon etc provide free computation for users of open-source components, analogous to open-source libraries? Interesting.BTW: minor typo/bug in active URLs in the tute (<a href="http://apify.heroku.com/tutorial/create" rel="nofollow">http://apify.heroku.com/tutorial/create</a>): an extra "s" in "episodess":<pre><code> http://apify.heroku.com/api/big_bang_theory_episodess.json http://apify.heroku.com/api/big_bang_theory_episodess/5.json</code></pre>

评论 #4060455 未加载

roycyang将近 13 年前

Looks interesting. I just tried to scrap a sample API but got an error with no further information on why it was broken:<a href="http://apify.heroku.com/resources/4fca83088526fe000100011a/edit" rel="nofollow">http://apify.heroku.com/resources/4fca83088526fe000100011a/e...</a>

评论 #4059214 未加载

评论 #4059215 未加载

jc4p将近 13 年前

Is it broken right now? It just says "API call failed: Internal Server Error" when I hit Test API.There's also a good API which powers my favorite Android HN app over here: <a href="http://hndroidapi.appspot.com/" rel="nofollow">http://hndroidapi.appspot.com/</a>

评论 #4059147 未加载

zafriedman将近 13 年前

This might be a stupid question and perhaps I didn't look hard enough on your website, but is this open source? I didn't see a GitHub link anywhere. I'm specifically curious as to how you routed Noko or whatever scraping library you're using to do its thing.

评论 #4059538 未加载

gildas将近 13 年前

Does not work with twitter [1]."API call failed: Internal Server Error"[1] <a href="http://apify.heroku.com/resources/4fcb23c5a06a160001000014" rel="nofollow">http://apify.heroku.com/resources/4fcb23c5a06a160001000014</a>

评论 #4060405 未加载

评论 #4060462 未加载

premasagar将近 13 年前

Did anyone ever make an API that could read a user's upvoted/saved articles from HN? It would require some kind of login credentials, as the data is not public.

评论 #4063046 未加载

sathish316将近 13 年前

If you're creating APIs, please add Attributes. To get quick help on css or xpath selectors for attributes press c or x in site.

temphn将近 13 年前

Does this work for sites that are behind logins? Didn't see anything related to authentication but may have missed it.

评论 #4060127 未加载

Trindaz将近 13 年前

Is this related to <a href="http://www.apifydoc.com/" rel="nofollow">http://www.apifydoc.com/</a>?

评论 #4059473 未加载

sinzone将近 13 年前

would be cool if all the APIs created via APIfy are automatically listed into Mashape.com

评论 #4059477 未加载

16 条评论

fizx将近 13 年前

评论 #4059550 未加载

评论 #4060474 未加载

评论 #4059578 未加载

pg将近 13 年前

HN does have an API: <a href="http://www.hnsearch.com/api" rel="nofollow">http://www.hnsearch.com/api</a>

评论 #4059948 未加载

评论 #4059399 未加载

评论 #4059740 未加载

评论 #4059463 未加载

sathish316将近 13 年前

Jd将近 13 年前

评论 #4059184 未加载

DanielRibeiro将近 13 年前

We have had an HN API for a while now: <a href="http://api.ihackernews.com/" rel="nofollow">http://api.ihackernews.com/</a>

评论 #4059386 未加载

altano将近 13 年前

评论 #4059323 未加载

6ren将近 13 年前

评论 #4060455 未加载

roycyang将近 13 年前

评论 #4059214 未加载

评论 #4059215 未加载

jc4p将近 13 年前

评论 #4059147 未加载

zafriedman将近 13 年前

评论 #4059538 未加载

gildas将近 13 年前

评论 #4060405 未加载

评论 #4060462 未加载

premasagar将近 13 年前

Did anyone ever make an API that could read a user's upvoted/saved articles from HN? It would require some kind of login credentials, as the data is not public.

评论 #4063046 未加载

sathish316将近 13 年前

If you're creating APIs, please add Attributes. To get quick help on css or xpath selectors for attributes press c or x in site.

temphn将近 13 年前

Does this work for sites that are behind logins? Didn't see anything related to authentication but may have missed it.

评论 #4060127 未加载

Trindaz将近 13 年前

Is this related to <a href="http://www.apifydoc.com/" rel="nofollow">http://www.apifydoc.com/</a>?

评论 #4059473 未加载

sinzone将近 13 年前

would be cool if all the APIs created via APIfy are automatically listed into Mashape.com

评论 #4059477 未加载