TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Amazon Web Services: Clouded by Duplicate Content

22 pointsby grepalmost 15 years ago

8 comments

jaueralmost 15 years ago
Wouldn't it be simpler to use VirtualHost so you only respond with content to requests for your domain?<p>Then set it up so requests without the domain name get a 301 redirect to the canonical URL.
评论 #1466471 未加载
akirkalmost 15 years ago
I don't quite understand why this article doesn't recommend using &#60;link rel="canonical" href="..."&#62; as described at <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" rel="nofollow">http://googlewebmastercentral.blogspot.com/2009/02/specify-y...</a> (resp. <a href="http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html" rel="nofollow">http://googlewebmastercentral.blogspot.com/2009/12/handling-...</a>)<p>Such an easy solution to this problem.
评论 #1466513 未加载
bdbalmost 15 years ago
Sorry, but the author is disqualified by this sentence:<p>"Now there were no external links to these AWS subdomains but, being a domain registrar, Google was notified of the new DNS entries and went ahead and indexed loads of pages."
评论 #1466430 未加载
评论 #1466401 未加载
dedwardalmost 15 years ago
"Now there were no external links to these AWS subdomains but, being a domain registrar, Google was notified of the new DNS entries and went ahead and indexed loads of pages"<p>Domain registrars wouldn't be notified of new RR's inside a second-level domain - that would be pointless.<p>I can't see any way they would ever index a URL that used a dns RR that was brand new - I'd hazard a guess that either the URL was used previously within the cloud and published somewhere, or it was set up as a CNAME in your own DNS, or your main webserver returned it as a response to a googlebot in some fashion at some point.
madssjalmost 15 years ago
I think we would all be better off just using an elastic ip address, and not using the dynamic address for public websites.<p>Also, the same problem applies to normal servers where the webserver is configured to show the website for the ip address, kind of like:<p><a href="http://174.132.225.106/" rel="nofollow">http://174.132.225.106/</a><p>which google also has picked up:<p><a href="http://www.google.dk/search?q=site:174.132.225.106" rel="nofollow">http://www.google.dk/search?q=site:174.132.225.106</a>
bkrauszalmost 15 years ago
Every website should have a similar redirect rule in there somewhere (I implement it in PHP). If someone hits yoursite.com, you probably want to redirect them to www.yoursite.com. I whitelist my domains such that if someone goes goes to anything that points to my server and isn't a valid subdomain, they get redirected to www.
joshualmost 15 years ago
Horseshit. Learn to configure your webserver.
rlpbalmost 15 years ago
If accessing your web server via .amazonaws.com does not make sense for you, why not just block (whether 403 or 404) all HTTP requests with a Host: *.amazonaws.com header, rather than messing around with rewrites and robots.txt?
评论 #1468180 未加载