20 pointsby nickbabout 16 years ago

4 comments

smanekabout 16 years ago

<i>That haystack is infinitely large.</i><p>Gah. I really hate it when people misuse "infinite".

评论 #491698 未加载

rozimabout 16 years ago

Greg Linden's (MSFT) comments on a recent Google paper on this:<p><a href="http://glinden.blogspot.com/2009/01/how-google-crawls-deep-web.html" rel="nofollow">http://glinden.blogspot.com/2009/01/how-google-crawls-deep-w...</a>

jaspertheghostabout 16 years ago

There's many startups attempting to do this including pipl.com, <a href="http://cazoodle.com/" rel="nofollow">http://cazoodle.com/</a> among others. Here's some research about it: <a href="http://www-sal.cs.uiuc.edu/~kcchang/" rel="nofollow">http://www-sal.cs.uiuc.edu/~kcchang/</a>

sam_in_nycabout 16 years ago

I believe I've seen this type of crawling in action in request logs. For example, Yahoo might try to request "news.ycombinator.com/user?id=britney_spears", even though it's not linked to from anywhere.

Exploring a ‘Deep Web’ That Google Can't Grasp

4 comments

Exploring a ‘Deep Web’ That Google Can't Grasp

4 comments