TechEcho

7 comments

djoerdabout 12 years ago

While I know that some of the pages of my home page are in the crawl, they do not show up with the following query: <a href="http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F~hiemstra" rel="nofollow">http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F...</a> nor with: <a href="http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F%7Ehiemstra" rel="nofollow">http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F...</a> (no, this is not only an ego search problem ;-) )

评论 #5327244 未加载

评论 #5327693 未加载

LisaGabout 12 years ago

I hope that some of you who use/play around with the Common Crawl data will try out using the JSON files from the URL Search and then share your code.<p>If you didn't see the details in the blog post, Common Crawl is giving out $100 in AWS credit to the first five people who share code that incorporates a JSON file from the URL Search.

评论 #5329739 未加载

LisaGabout 12 years ago

From @djoerd Why does @CommonCrawl URL search (<a href="http://urlsearch.commoncrawl.org/" rel="nofollow">http://urlsearch.commoncrawl.org/</a> ) need 'tld.domain' format rather than 'domain.tld'? Read Google's BigTable paper.

frederiabout 12 years ago

Why can't they just write code that reverses the input?

评论 #5327327 未加载

评论 #5327234 未加载

lubujacksonabout 12 years ago

I'd love it if there was a feature to search for a specific URL. Like if "com.google" just loaded the Google homepage if you put it in quotes.

评论 #5327300 未加载

lubujacksonabout 12 years ago

Top results for "com" are a little odd. Seems like @ wasn't filtered from the domain part of the URL (though it should be, I would think).

评论 #5327298 未加载

djoerdabout 12 years ago

The first FAQ link seems to be broken (maybe a web server setting gone bad?) BTW, this is a great resource. Thanks for sharing this!

评论 #5327227 未加载

7 comments

djoerdabout 12 years ago

评论 #5327244 未加载

评论 #5327693 未加载

LisaGabout 12 years ago

评论 #5329739 未加载

LisaGabout 12 years ago

frederiabout 12 years ago

Why can't they just write code that reverses the input?

评论 #5327327 未加载

评论 #5327234 未加载

lubujacksonabout 12 years ago

I'd love it if there was a feature to search for a specific URL. Like if "com.google" just loaded the Google homepage if you put it in quotes.

评论 #5327300 未加载

lubujacksonabout 12 years ago

Top results for "com" are a little odd. Seems like @ wasn't filtered from the domain part of the URL (though it should be, I would think).

评论 #5327298 未加载

djoerdabout 12 years ago

The first FAQ link seems to be broken (maybe a web server setting gone bad?) BTW, this is a great resource. Thanks for sharing this!

评论 #5327227 未加载

Share code that uses new URL Search tool and win AWS credit

7 comments

Share code that uses new URL Search tool and win AWS credit

7 comments