TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Download the first 10,002,378 HN comments/stories as one archive

90 点作者 cdman将近 10 年前
Magnet link: magnet:?xt=urn:btih:44c65b5779d9d8021e002584fa73740f36d052a6&amp;dn=10m_hn_comments_sorted<p>Go to https:&#x2F;&#x2F;hn-archive.appspot.com&#x2F; for the torrent file &#x2F; source code.<p>I&#x27;ll be semi-frequently checking the story and answering any questions which may come up.

10 条评论

duggan将近 10 年前
Somehow I can never turn down a data dump, despite never having done much with one.<p>Some day!
binarymax将近 10 年前
Thank you for this! I&#x27;m training word2vec on it right now - will take several hours.<p>If anyone else is interested here is the (terrible) code to get it into a prototype format. <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;binarymax&#x2F;d3691180e65ff7f0dec5" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;binarymax&#x2F;d3691180e65ff7f0dec5</a>
评论 #10004893 未加载
tilt将近 10 年前
<a href="https:&#x2F;&#x2F;hn-archive.appspot.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hn-archive.appspot.com&#x2F;</a><p>Clickable
评论 #10002992 未加载
theklub将近 10 年前
Someone should map the use of tech buzzwords over the years. Would be pretty funny to look at.
评论 #10004367 未加载
paulsutter将近 10 年前
I wish it included upvotes&#x2F;downvotes. Why are those secret? It would be fun to work on ranking algorithms, and any inc effective requires knowing who is doing the up&#x2F;down voting.
评论 #10003548 未加载
ivan_ah将近 10 年前
&gt; 10,002,378<p>what date range does this correspond to? How big is the archive?
评论 #10002988 未加载
评论 #10003007 未加载
orf将近 10 年前
Does this include [dead] comments?
评论 #10003233 未加载
callum85将近 10 年前
Which pieces of data are included with each comment&#x2F;link?
评论 #10005214 未加载
toomuchtodo将近 10 年前
What license applies to the archive? Creative commons?
sitkack将近 10 年前
meta data request: can someone scrape the tracker and provide a log of the all the IPs that participated in the swarm?