TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Share code that uses new URL Search tool and win AWS credit

17 pointsby LisaGabout 12 years ago

7 comments

djoerdabout 12 years ago
While I know that some of the pages of my home page are in the crawl, they do not show up with the following query: <a href="http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F~hiemstra" rel="nofollow">http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F...</a> nor with: <a href="http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F%7Ehiemstra" rel="nofollow">http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F...</a> (no, this is not only an ego search problem ;-) )
评论 #5327244 未加载
评论 #5327693 未加载
LisaGabout 12 years ago
I hope that some of you who use/play around with the Common Crawl data will try out using the JSON files from the URL Search and then share your code.<p>If you didn't see the details in the blog post, Common Crawl is giving out $100 in AWS credit to the first five people who share code that incorporates a JSON file from the URL Search.
评论 #5329739 未加载
LisaGabout 12 years ago
From @djoerd Why does @CommonCrawl URL search (<a href="http://urlsearch.commoncrawl.org/" rel="nofollow">http://urlsearch.commoncrawl.org/</a> ) need 'tld.domain' format rather than 'domain.tld'? Read Google's BigTable paper.
frederiabout 12 years ago
Why can't they just write code that reverses the input?
评论 #5327327 未加载
评论 #5327234 未加载
lubujacksonabout 12 years ago
I'd love it if there was a feature to search for a specific URL. Like if "com.google" just loaded the Google homepage if you put it in quotes.
评论 #5327300 未加载
lubujacksonabout 12 years ago
Top results for "com" are a little odd. Seems like @ wasn't filtered from the domain part of the URL (though it should be, I would think).
评论 #5327298 未加载
djoerdabout 12 years ago
The first FAQ link seems to be broken (maybe a web server setting gone bad?) BTW, this is a great resource. Thanks for sharing this!
评论 #5327227 未加载