TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Do Not Crawl in the DUST: Different URLs with Similar Text

9 点作者 quilby大约 16 年前

1 comment

quilby大约 16 年前
I just found this paper in my school's library. Its not new and it looks like google + yahoo + msn have maybe given up on trying to find 'DUST' because they now let you do that<p>( <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" rel="nofollow">http://googlewebmastercentral.blogspot.com/2009/02/specify-y...</a> , <a href="http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/" rel="nofollow">http://ysearchblog.com/2009/02/12/fighting-duplication-addin...</a> , <a href="http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx" rel="nofollow">http://blogs.msdn.com/webmaster/archive/2009/02/12/partnerin...</a> )<p>Its still interesting that<p>1. Many sites have a lot of 'DUST'<p>2. It is not very hard to find the 'DUST'- which obviously reduces crawling time.