TechEcho

I recently completed a small web crawl including only pages linked from HN stories. For example I would crawl the lwn article from https://news.ycombinator.com/item?id=32614633. I used archive.org's wayback machine to fetch their copy nearest to the HN submission's timestamp. If archive didn't have a copy, I did a direct fetch. It's about 2.5 million pages.I would like to publish it for others to use, but I'm not sure how useful it would be. In the HN spirit of validating customers early, I'd like to gauge interest of those would actually download and use such a resource before moving forward. Let me know.

Ask HN: How would you use a small web crawl?

no comments

Ask HN: How would you use a small web crawl?

no comments