TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Home
I just finished crawling 5.19B web pages, Ask Me Anything
19 points
by
dor_jack
about 8 years ago
I WAS JUST RATE LIMITED BY HN, SO IM GOING TO ANSWER YOUR QUESTIONS UNDER A NEW ACCOUNT: dor_jack_2
7 comments
grzm
about 8 years ago
If you're rate-limited, you can contact the mods via the Contact link in the footer.
dm_i386
about 8 years ago
Collapse
What tools did you use? What had to be custom-written and why?
评论 #14152751 未加载
maurtinshkreli
about 8 years ago
Collapse
How much did it cost?
评论 #14153104 未加载
tlack
about 8 years ago
Collapse
what did you do to avoid winding up in endless GET url loops? How deep did you get per site, and how did you schedule followup requests?
评论 #14152778 未加载
joshpen188
about 8 years ago
Collapse
Why didn't you use common crawl instead?
评论 #14152761 未加载
savethefuture
about 8 years ago
Collapse
What did you discover.
评论 #14152580 未加载
评论 #14152619 未加载
评论 #14152559 未加载
itburnslikeice
about 8 years ago
Collapse
but why?
评论 #14152573 未加载