I just found this paper in my school's library. Its not new and it looks like google + yahoo + msn have maybe given up on trying to find 'DUST' because they now let you do that<p>( <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" rel="nofollow">http://googlewebmastercentral.blogspot.com/2009/02/specify-y...</a> ,
<a href="http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/" rel="nofollow">http://ysearchblog.com/2009/02/12/fighting-duplication-addin...</a> ,
<a href="http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx" rel="nofollow">http://blogs.msdn.com/webmaster/archive/2009/02/12/partnerin...</a> )<p>Its still interesting that<p>1. Many sites have a lot of 'DUST'<p>2. It is not very hard to find the 'DUST'- which obviously reduces crawling time.