TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

I just finished crawling 5.19B web pages, Ask Me Anything

19 pointsby dor_jackabout 8 years ago
I WAS JUST RATE LIMITED BY HN, SO IM GOING TO ANSWER YOUR QUESTIONS UNDER A NEW ACCOUNT: dor_jack_2

7 comments

grzmabout 8 years ago
If you're rate-limited, you can contact the mods via the Contact link in the footer.
dm_i386about 8 years ago
What tools did you use? What had to be custom-written and why?
评论 #14152751 未加载
maurtinshkreliabout 8 years ago
How much did it cost?
评论 #14153104 未加载
tlackabout 8 years ago
what did you do to avoid winding up in endless GET url loops? How deep did you get per site, and how did you schedule followup requests?
评论 #14152778 未加载
joshpen188about 8 years ago
Why didn't you use common crawl instead?
评论 #14152761 未加载
savethefutureabout 8 years ago
What did you discover.
评论 #14152580 未加载
评论 #14152619 未加载
评论 #14152559 未加载
itburnslikeiceabout 8 years ago
but why?
评论 #14152573 未加载