TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Obama's New Robots.txt

56 点作者 r00k超过 16 年前

6 条评论

tlrobinson超过 16 年前
The vast majority of the entries in Bush's robots.txt were filtering out the plain text versions which are linked at the bottom of the HTML versions containing identical content. This prevents duplicates from showing up in searches. This is likely done automatically by whatever software they use to manage the content.<p>Want proof? Pick any of the entries ending in "/text", for example "/911/911day/text", search Google with the "/text" removed like this: "site:whitehouse.gov inurl:/911/911day" and you can still see the page in the Google cache (at least until Google's index is updated).<p>If you want to view it as a metaphor, fine, but there's no evidence Bush's administration was trying to hide anything on their website like this article implies. If they wanted to hide it, why would they put it on there in the first place?
miketheburrito超过 16 年前
This is a great and semi-metaphorical comparison (woohoo transparency!), but to be fair, the Obama administration hasn't done anything yet, so there isn't even anything to hide at this point.
nir超过 16 年前
Having /includes/ under document root - and trying to fix this via a robots.txt entry (??) - wouldn't reflect well on Obama, if they actually had any meaning :)
评论 #442226 未加载
评论 #442201 未加载
评论 #442197 未加载
gojomo超过 16 年前
Why aren't we allowed to crawl their JS and CSS?<p><i>What are they trying to hide?</i>
评论 #442368 未加载
评论 #442393 未加载
dejb超过 16 年前
I'm more interested in what CMS they are using. Any ideas?
jamesv超过 16 年前
/firstlady/newborn/text !?
评论 #442908 未加载
评论 #442313 未加载