TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Share code that uses new URL Search tool and win AWS credit

17 点作者 LisaG大约 12 年前

7 条评论

djoerd大约 12 年前
While I know that some of the pages of my home page are in the crawl, they do not show up with the following query: <a href="http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F~hiemstra" rel="nofollow">http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F...</a> nor with: <a href="http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F%7Ehiemstra" rel="nofollow">http://urlsearch.commoncrawl.org/?q=nl.utwente.cs.wwwhome%2F...</a> (no, this is not only an ego search problem ;-) )
评论 #5327244 未加载
评论 #5327693 未加载
LisaG大约 12 年前
I hope that some of you who use/play around with the Common Crawl data will try out using the JSON files from the URL Search and then share your code.<p>If you didn't see the details in the blog post, Common Crawl is giving out $100 in AWS credit to the first five people who share code that incorporates a JSON file from the URL Search.
评论 #5329739 未加载
LisaG大约 12 年前
From @djoerd Why does @CommonCrawl URL search (<a href="http://urlsearch.commoncrawl.org/" rel="nofollow">http://urlsearch.commoncrawl.org/</a> ) need 'tld.domain' format rather than 'domain.tld'? Read Google's BigTable paper.
frederi大约 12 年前
Why can't they just write code that reverses the input?
评论 #5327327 未加载
评论 #5327234 未加载
lubujackson大约 12 年前
I'd love it if there was a feature to search for a specific URL. Like if "com.google" just loaded the Google homepage if you put it in quotes.
评论 #5327300 未加载
lubujackson大约 12 年前
Top results for "com" are a little odd. Seems like @ wasn't filtered from the domain part of the URL (though it should be, I would think).
评论 #5327298 未加载
djoerd大约 12 年前
The first FAQ link seems to be broken (maybe a web server setting gone bad?) BTW, this is a great resource. Thanks for sharing this!
评论 #5327227 未加载