TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

HackerNews API: What if HN does not have API? Make API on the fly with APIfy

132 点作者 sathish316将近 13 年前

16 条评论

fizx将近 13 年前
Hah! tectonic and I applied to YC with almost exactly this in 2009?!<p>We went as far as building a browser-based IDE-like environment for generating these, and a language called parsley for expressing the scrapes. If you're interested in this, you could check out some of our related open source libraries:<p>Edit: I just open-sourced the scraping wiki site we created here: <a href="https://github.com/fizx/parselets_com" rel="nofollow">https://github.com/fizx/parselets_com</a><p><a href="http://selectorgadget.com" rel="nofollow">http://selectorgadget.com</a><p><a href="https://github.com/fizx/parsley" rel="nofollow">https://github.com/fizx/parsley</a><p><a href="https://github.com/fizx/parsley-ruby" rel="nofollow">https://github.com/fizx/parsley-ruby</a><p><a href="https://github.com/fizx/pyparsley" rel="nofollow">https://github.com/fizx/pyparsley</a><p><a href="https://github.com/fizx/csvget" rel="nofollow">https://github.com/fizx/csvget</a><p><pre><code> &#62; cat hn.let { "headlines":[{ "title": ".title a", "link": ".title a @href", "comments": "match(.subtext a:nth-child(3), '\\d+')", "user": ".subtext a:nth-child(2)", "score": "match(.subtext span, '\\d+')", "time": "match(.subtext, '\\d+\\s+\\w+\\s+ago')" }] } &#62; csvget --directory-prefix=./data -A "/x" -w 5 --parselet=hn.let http://news.ycombinator.com/ &#62; head data/headlines.csv comments,title,time,link,score,user 4,Simpson's paradox: why mistrust seemingly simple statistics,2 hours ago,http://en.wikipedia.org/wiki/Simpson%27s_paradox,41,waldrews 67,America's unjust sex laws,2 hours ago,http://www.economist.com/opinion/displaystory.cfm?story_id=14165460,59,MikeCapone 23,Buy somebody lunch,3 hours ago,http://www.whattofix.com/blog/archives/2009/08/buy-somebody-lu.php,58,DanielBMarkham</code></pre>
评论 #4059550 未加载
评论 #4060474 未加载
评论 #4059578 未加载
pg将近 13 年前
HN does have an API: <a href="http://www.hnsearch.com/api" rel="nofollow">http://www.hnsearch.com/api</a>
评论 #4059948 未加载
评论 #4059399 未加载
评论 #4059740 未加载
评论 #4059463 未加载
sathish316将近 13 年前
Hacker News content is expired every 1 hour.<p>Hacker New Newest links are also available here: <a href="http://apify.heroku.com/resources/4fca651b8526fe0001000002" rel="nofollow">http://apify.heroku.com/resources/4fca651b8526fe0001000002</a><p>Other APIs are never expired (Expire feature is still not pushed)
Jd将近 13 年前
My problem with HackerNews API (having done something like this -- the Hacker News Filter on Github) is that you get throttled after you hit a certain number of HTTP requests and your IP gets banned for a certain amount of time.<p>So as nice as this is, it simply won't work here for the many people who would like to use near live data on HN.
评论 #4059184 未加载
DanielRibeiro将近 13 年前
We have had an HN API for a while now: <a href="http://api.ihackernews.com/" rel="nofollow">http://api.ihackernews.com/</a>
评论 #4059386 未加载
altano将近 13 年前
Can you add support for CORS (<a href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing" rel="nofollow">http://en.wikipedia.org/wiki/Cross-origin_resource_sharing</a>)?<p>Can you add support for taking existing JSON API (rather than scraping HTML)? This useful for APIs that are neither accessible with CORS nor JSONP, APIs that are provided by incompetent mental midgets who don't answer emails or participate to their Google Group (<i>cough</i> MBTA <i>cough</i>).
评论 #4059323 未加载
6ren将近 13 年前
Seems to be fried. (Popularity is a good sign.)<p>So, it's basically a web-scraper, but with a JSON API. The API input is limited to a single parameter, that indexes the record to be scraped. The API output is taken from that indexed record, consisting of a set of scraped elements within that record, and presented as JSON, with attributes named as user specified.<p>Although this is limited to a list of renamed records, it could be extended (if needed), and I really like the concept and UI implementation. Feedback: As someone who has never used css, I found it very tricky to even duplicate the tutorial: selectors are sensitive to leading and trailing spaces; the selectors given in the tute aren't what's needed (and see BTW below); and often "API call failed: Internal Server Error" indicating a problem with the selector, but not what it is, and ATM service is often "unavailable" :), it's slow switching back and forth between "edit" and "test" (why not include testing on the same page? like HN comment edits: textarea + rendered result); when an attribute is removed, it remains in the JSON (code eg <a href="http://apify.heroku.com/resources/4fcb26d7a06a160001000024" rel="nofollow">http://apify.heroku.com/resources/4fcb26d7a06a160001000024</a>); it takes a long time (30s, 1min) to get a result. I hate to say it, but it's like my experience with ruby: it takes so much time and effort to get the tool to basically work, that I've used up all my enthusiasm/gumption and have none left for the project I had in mind. But much of this is because of current traffic spike, my ignorance of css, and minor polishing/bugs that can be fixed in vers 1.1 - as I said, I really like the idea and UI.<p>But a deeper question: why a service, instead of a library? It's cross-language, but has an extra dependency (the service), an extra network jump, processing from many users convening at one point. It's interesting to me, because the world seems to be moving towards services, and this would logically include <i>components that formerly would be libraries</i>. Will this happen? What are the pros and cons? Will Amazon etc provide free computation for users of open-source components, analogous to open-source libraries? Interesting.<p>BTW: minor typo/bug in active URLs in the tute (<a href="http://apify.heroku.com/tutorial/create" rel="nofollow">http://apify.heroku.com/tutorial/create</a>): an extra "s" in "episodess":<p><pre><code> http://apify.heroku.com/api/big_bang_theory_episodess.json http://apify.heroku.com/api/big_bang_theory_episodess/5.json</code></pre>
评论 #4060455 未加载
roycyang将近 13 年前
Looks interesting. I just tried to scrap a sample API but got an error with no further information on why it was broken:<p><a href="http://apify.heroku.com/resources/4fca83088526fe000100011a/edit" rel="nofollow">http://apify.heroku.com/resources/4fca83088526fe000100011a/e...</a>
评论 #4059214 未加载
评论 #4059215 未加载
jc4p将近 13 年前
Is it broken right now? It just says "API call failed: Internal Server Error" when I hit Test API.<p>There's also a good API which powers my favorite Android HN app over here: <a href="http://hndroidapi.appspot.com/" rel="nofollow">http://hndroidapi.appspot.com/</a>
评论 #4059147 未加载
zafriedman将近 13 年前
This might be a stupid question and perhaps I didn't look hard enough on your website, but is this open source? I didn't see a GitHub link anywhere. I'm specifically curious as to how you routed Noko or whatever scraping library you're using to do its thing.
评论 #4059538 未加载
gildas将近 13 年前
Does not work with twitter [1].<p>"API call failed: Internal Server Error"<p>[1] <a href="http://apify.heroku.com/resources/4fcb23c5a06a160001000014" rel="nofollow">http://apify.heroku.com/resources/4fcb23c5a06a160001000014</a>
评论 #4060405 未加载
评论 #4060462 未加载
premasagar将近 13 年前
Did anyone ever make an API that could read a user's upvoted/saved articles from HN? It would require some kind of login credentials, as the data is not public.
评论 #4063046 未加载
sathish316将近 13 年前
If you're creating APIs, please add Attributes. To get quick help on css or xpath selectors for attributes press c or x in site.
temphn将近 13 年前
Does this work for sites that are behind logins? Didn't see anything related to authentication but may have missed it.
评论 #4060127 未加载
Trindaz将近 13 年前
Is this related to <a href="http://www.apifydoc.com/" rel="nofollow">http://www.apifydoc.com/</a>?
评论 #4059473 未加载
sinzone将近 13 年前
would be cool if all the APIs created via APIfy are automatically listed into Mashape.com
评论 #4059477 未加载