TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How would you use a small web crawl?

1 pointsby agenciesover 2 years ago
I recently completed a small web crawl including <i>only</i> pages linked from HN stories. For example I would crawl the lwn article from https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32614633. I used archive.org&#x27;s wayback machine to fetch their copy nearest to the HN submission&#x27;s timestamp. If archive didn&#x27;t have a copy, I did a direct fetch. It&#x27;s about 2.5 million pages.<p>I would like to publish it for others to use, but I&#x27;m not sure how useful it would be. In the HN spirit of validating customers early, I&#x27;d like to gauge interest <i>of those would actually download and use such a resource</i> before moving forward. Let me know.

no comments

no comments