TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Obama's New Robots.txt

56 pointsby r00kover 16 years ago

6 comments

tlrobinsonover 16 years ago
The vast majority of the entries in Bush's robots.txt were filtering out the plain text versions which are linked at the bottom of the HTML versions containing identical content. This prevents duplicates from showing up in searches. This is likely done automatically by whatever software they use to manage the content.<p>Want proof? Pick any of the entries ending in "/text", for example "/911/911day/text", search Google with the "/text" removed like this: "site:whitehouse.gov inurl:/911/911day" and you can still see the page in the Google cache (at least until Google's index is updated).<p>If you want to view it as a metaphor, fine, but there's no evidence Bush's administration was trying to hide anything on their website like this article implies. If they wanted to hide it, why would they put it on there in the first place?
miketheburritoover 16 years ago
This is a great and semi-metaphorical comparison (woohoo transparency!), but to be fair, the Obama administration hasn't done anything yet, so there isn't even anything to hide at this point.
nirover 16 years ago
Having /includes/ under document root - and trying to fix this via a robots.txt entry (??) - wouldn't reflect well on Obama, if they actually had any meaning :)
评论 #442226 未加载
评论 #442201 未加载
评论 #442197 未加载
gojomoover 16 years ago
Why aren't we allowed to crawl their JS and CSS?<p><i>What are they trying to hide?</i>
评论 #442368 未加载
评论 #442393 未加载
dejbover 16 years ago
I'm more interested in what CMS they are using. Any ideas?
jamesvover 16 years ago
/firstlady/newborn/text !?
评论 #442908 未加载
评论 #442313 未加载