TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Download the first 10,002,378 HN comments/stories as one archive

90 pointsby cdmanalmost 10 years ago
Magnet link: magnet:?xt=urn:btih:44c65b5779d9d8021e002584fa73740f36d052a6&amp;dn=10m_hn_comments_sorted<p>Go to https:&#x2F;&#x2F;hn-archive.appspot.com&#x2F; for the torrent file &#x2F; source code.<p>I&#x27;ll be semi-frequently checking the story and answering any questions which may come up.

10 comments

dugganalmost 10 years ago
Somehow I can never turn down a data dump, despite never having done much with one.<p>Some day!
binarymaxalmost 10 years ago
Thank you for this! I&#x27;m training word2vec on it right now - will take several hours.<p>If anyone else is interested here is the (terrible) code to get it into a prototype format. <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;binarymax&#x2F;d3691180e65ff7f0dec5" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;binarymax&#x2F;d3691180e65ff7f0dec5</a>
评论 #10004893 未加载
tiltalmost 10 years ago
<a href="https:&#x2F;&#x2F;hn-archive.appspot.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hn-archive.appspot.com&#x2F;</a><p>Clickable
评论 #10002992 未加载
theklubalmost 10 years ago
Someone should map the use of tech buzzwords over the years. Would be pretty funny to look at.
评论 #10004367 未加载
paulsutteralmost 10 years ago
I wish it included upvotes&#x2F;downvotes. Why are those secret? It would be fun to work on ranking algorithms, and any inc effective requires knowing who is doing the up&#x2F;down voting.
评论 #10003548 未加载
ivan_ahalmost 10 years ago
&gt; 10,002,378<p>what date range does this correspond to? How big is the archive?
评论 #10002988 未加载
评论 #10003007 未加载
orfalmost 10 years ago
Does this include [dead] comments?
评论 #10003233 未加载
callum85almost 10 years ago
Which pieces of data are included with each comment&#x2F;link?
评论 #10005214 未加载
toomuchtodoalmost 10 years ago
What license applies to the archive? Creative commons?
sitkackalmost 10 years ago
meta data request: can someone scrape the tracker and provide a log of the all the IPs that participated in the swarm?