TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Bing search results showing up in Google

156 点作者 ZeroMinx超过 14 年前

18 条评论

jedsmith超过 14 年前
With respect to the author, the conclusion here is very flawed.<p>If you search for Bing in Google, you get Bing all over page 1. If you search for Google in Bing, you get Google all over page 1. That's not the result of Google capturing click stream data from Google Chrome and copying Bing's results, nor is it the result of Microsoft capturing click stream data from IE8 and copying Google's results. That's just the nature of indexing.<p>As for robots.txt disallowing those URLs, there is <i>no</i> standard for robots.txt behavior. I have observed some user agents treat it as case insensitive, and others treat it as case sensitive.<p>Honestly, this isn't even in the same ballpark as the Google accusations made earlier this week, and it smacks of just <i>looking</i> for things to accuse Google of in response to the "Binggate" (ugh, I typed it) drama. Can't we go back to more productive things?
评论 #2183882 未加载
评论 #2183700 未加载
评论 #2183713 未加载
评论 #2183573 未加载
评论 #2183752 未加载
Matt_Cutts超过 14 年前
The two major issues in this article were:<p>- Google can see and return links to pages without crawling them. I made a video and a blog post about this a while ago: <a href="http://www.mattcutts.com/blog/robots-txt-remove-url/" rel="nofollow">http://www.mattcutts.com/blog/robots-txt-remove-url/</a><p>- URL paths are case sensitive. Bing blocks /search in its robots.txt but not /Search. That's how the /Search urls got crawled.<p>In a later edit, the author suggests "It would be fairly trivial for bots to test if the server is IIS (if the server identifies itself as such of course) or to try to retrieve Robots.txt and robots.txt, if those come up as equal then the sever can be assumed to be case insensitive."<p>The issue of case sensitivity in robots.txt is a long, very nuanced topic. Here's just one example to get you started: at least back in 2007 when we were talking about this amongst ourselves at Google, the web server for developer.apple.com was case-insensitive, but their robots.txt had lines like this:<p>Disallow: /documentation/quicktime/<p>Disallow: /documentation/Quicktime/<p>Disallow: /documentation/QUICKTIME/<p>Disallow: /documentation/macosx/<p>Why would they do that? Apparently because Apple wanted the canonical link to be /documentation/QuickTime . Back then, at least 21M robots.txt files on the web had mixed-case paths. If Google started interpreting robots.txt files from servers that claimed to be IIS differently... well, I'll leave it as an exercise to the reader to come up with some of the unexpected bugs and behavior that could result.<p>I know it's really tempting to write a headline like "Bing search results showing up in Google," but I wish the author had done more research instead of going for a gotcha. Any SEO worth his/her salt could have explained what was going on here.
评论 #2184741 未加载
floatingatoll超过 14 年前
Dear Microsoft, case-sensitivity is important. [1]<p>m.bing.com/robots.txt says "/search", not "/Search". All of the crawled [2] urls are "/Search" or "/~/search".<p>Also,<p>wap.bing.com/robots.txt explicitly "Allow:"s several search pages, which are indexed by google.<p>[1] EDIT: Case insensitivity is often important. Above comment notes that some robots are case-insensitive. I suspect Google is not, based on the results.<p>[2] EDIT: I said indexed, a reply corrects to crawled. Good point, thanks.
评论 #2183553 未加载
评论 #2183843 未加载
jsnell超过 14 年前
&#62; never mind that Bing only used its toolbar as a url discovery device<p>That is obviously untrue, and shows that the author does not understand the issue even superficially. The Google experiment showed that Bing was associating urls to search terms for no reason other than that Google had done so. You know, like making a search for mbzrxpgjys return rim.com, a URL which we can safely assume Bing was already quite aware of.
评论 #2183727 未加载
ashleyw超过 14 年前
Aren't /search and /Search considered two different directories when it comes to robots.txt?
评论 #2183919 未加载
haberman超过 14 年前
Never underestimate the ability of a human being to rationalize.<p>If I was in the Microsoft camp, I'm sure I would also be grasping at straws to explain why it's totally fine for Bing to use Google's search results. It's human nature to rationalize.<p>The bottom line is that Bing's index contains associations that it could never have figured out if Google hadn't figured them out first. How many there are, we cannot know. There's no way around the fact that Bing is piggybacking on the work of Google's search engineers.<p>Is it "good for the customer?" In the short term, it's good for the customer if they can buy $1 bootlegged DVDs. In the long term, it's bad for the consumer if the money goes to bootleggers instead of the people who are doing the actual work.<p>Think I'm exaggerating the effect of just "1 out of 1000 signals?" This argument would be extremely easy to refute. Stop using Google's results. If it really isn't that significant, then why should it be a problem to stop using it? Just turn it off and let everyone observe that the quality is 99.9% as good as it used to be, and avoid any accusation of copying.<p>By refusing to turn it off, Microsoft makes it clear that it <i>is</i> an important part of their index, and that they have no qualms about having an important part of their index ripped off wholesale from their biggest competitor. Maybe it's a smart business move. But if that's the case, spare us the outrage about being called "copyists."
评论 #2183904 未加载
评论 #2183723 未加载
评论 #2183996 未加载
评论 #2183915 未加载
评论 #2183733 未加载
mukyu超过 14 年前
We need to get over this partisan "gotcha journalism". No one really benefits from everyone making low content blog posts with any random accusations that make their side 'right' (which just happens to be ad hominem anyways).
moultano超过 14 年前
This looks like microsoft is assuming robots.txt is case insensitive?
评论 #2183660 未加载
illdave超过 14 年前
If you look at <a href="http://wap.bing.com/robots.txt" rel="nofollow">http://wap.bing.com/robots.txt</a>, the URLs that Google is returning are actually all set to 'allow', not disallow.<p>It also looks like m.bing.com/robots.txt blocks /search while their actual URLs are /Search - I guess Googlebot treats robots.txt as case-sensitive.
aristidb超过 14 年前
robots.txt applies to the source of the links, not the target.<p>So if, say, <a href="http://www.paulgraham.com/" rel="nofollow">http://www.paulgraham.com/</a> links to <a href="http://m.bing.com/search" rel="nofollow">http://m.bing.com/search</a>, then <a href="http://m.bing.com/robots.txt" rel="nofollow">http://m.bing.com/robots.txt</a> does not apply to that.<p>EDIT: If you think this is wrong, please explain it instead of just downvoting me, because I think it is pretty unfair that I lose karma for explaining my interpretation.
评论 #2183510 未加载
评论 #2183563 未加载
评论 #2183572 未加载
评论 #2183628 未加载
评论 #2183516 未加载
dminor超过 14 年前
Google has likely indexed links to Bing found on <i>other pages</i>, rather than on Bing itself. That doesn't mean it followed the links (and it wouldn't, if excluded by robots.txt).
Herring超过 14 年前
&#62;<i>never mind that Bing only used its toolbar as a url discovery device, not to 'copy search results'</i><p>Yeah, they just happened to discover high quality urls on google. What are the chances?
评论 #2183668 未加载
pmb超过 14 年前
robots.txt disallows (or did until recently) only "/search". The results shown have "/Search" in the url. Bing screwed up.
mwg66超过 14 年前
Bit different.
yaix超过 14 年前
Shouldn't the headline be "Bing explicitly allowing some results pages to showing up in other search engines".<p>The wap.bing.com/robots.txt blocks all /search/ and then explicitly allows a few. What ever the reason is for that.<p>Very weak article, IMHO.
maeon3超过 14 年前
Microsoft is way out of line. Google figures out what content is good by crawling every page and doing the leg work, and Bing copies Google data and displays what google displays.<p>Google proved it with the bing sting. there is absolutly NO reason why bing should have linked to those documents, other than that they copied off of Google's exam paper.<p>When students do this, it is called plagiarizing. The smoke getting thrown by MS is just to distract and divert while they scramble to hide what they did.
评论 #2183644 未加载
评论 #2183674 未加载
评论 #2183847 未加载
评论 #2183620 未加载
shareme超过 14 年前
robot.txt excludes /search not /Search..big difference as 99.99% of return results are ../Search*<p>MS mistake on robot.txt file not Google's
jamesjyu超过 14 年前
I agree with the author here. I think that Google will come away from this looking combative and childish.