TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Hacker News Dataset Update October 2016

98 点作者 aaronhoffman超过 8 年前

3 条评论

yeldarb超过 8 年前
&gt; Caution: That command took just over 30 hours to complete on my macbook. (it also killed Finder a couple times and I had to disable spotlight on the folder I was saving all the .json files to)<p>I had a similar job I needed to do a few months ago and used AWS lambda to massively parallelize the work.<p>I was able to bring down what I estimated would take my laptop 30 days down to about an hour by sharding to a ton of small instances.<p>Might be worth a look if you plan on updating this with any regularity.
评论 #12838558 未加载
评论 #12838545 未加载
aaronhoffman超过 8 年前
I noticed the Hacker News dataset that was published to big query was now a year out of date.<p>I have created an updated copy and made it available for download.<p>(This is the last 10MM entries, I can add the rest if people are interested.)
评论 #12838331 未加载
minimaxir超过 8 年前
Since the Hacker News API (<a href="https:&#x2F;&#x2F;github.com&#x2F;HackerNews&#x2F;API" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;HackerNews&#x2F;API</a>) used in this scraping is being brought up again, I&#x27;ll ask a burning question: <i>is development of the API dead?</i><p>From the commit notes in that repo, the only changes from the initial release <i>in 2014</i> are &quot;minor README updates.&quot;
评论 #12838321 未加载
评论 #12838278 未加载