TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Exploring a ‘Deep Web’ That Google Can't Grasp

20 pointsby nickbabout 16 years ago

4 comments

smanekabout 16 years ago
<i>That haystack is infinitely large.</i><p>Gah. I really hate it when people misuse "infinite".
评论 #491698 未加载
rozimabout 16 years ago
Greg Linden's (MSFT) comments on a recent Google paper on this:<p><a href="http://glinden.blogspot.com/2009/01/how-google-crawls-deep-web.html" rel="nofollow">http://glinden.blogspot.com/2009/01/how-google-crawls-deep-w...</a>
jaspertheghostabout 16 years ago
There's many startups attempting to do this including pipl.com, <a href="http://cazoodle.com/" rel="nofollow">http://cazoodle.com/</a> among others. Here's some research about it: <a href="http://www-sal.cs.uiuc.edu/~kcchang/" rel="nofollow">http://www-sal.cs.uiuc.edu/~kcchang/</a>
sam_in_nycabout 16 years ago
I believe I've seen this type of crawling in action in request logs. For example, Yahoo might try to request "news.ycombinator.com/user?id=britney_spears", even though it's not linked to from anywhere.