TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Googlebot now makes POST requests via AJAX

146 点作者 avirambm超过 13 年前

12 条评论

thezilch超过 13 年前
We have witnessed a Google "bot" use one of our AJAX requests, which is strange considering we have robots.txt blocked all of our AJAX requests -- robots.txt disallows requests on /remote/, which all of "remote" (AJAX) requests are proxied through. As well, the request only happens (automatically) for a user after they have made a POST to another form; the later requires POST data, where as the POST in question could really be a GET -- our AJAX wrapper requires an explicit use of GET.<p>Nonetheless, there have been some more recent Google "bots," including the POST in question, that may be of interest to those that track those metrics -- remove those requests from internal reports.<p>Requests:<p><pre><code> 74.125.78.83 "POST /remote/poll/230378/demographics/ HTTP/1.0" 200 6469 "http://www.sodahead.com/united-states/would-you-like-to-wish-my-daughter-hannahgirl-a-happy-birthday/question-230378/?page=2 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Chrome/12.0.742 Safari/534.51" www.sodahead.com 74.125.78.83 "GET /entertainment/are-cigarettes-destroying-adeles-voice/question-2212639/ HTTP/1.0" 200 9240 "-" "Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Version/4.0 Mobile Safari/534.51" m.sodahead.com 74.125.78.83 "GET /united-states/how-the-white-house-public-relations-campaign-on-the-oil-spill-is-harming-the-actual-clean-up/blog-367099/ HTTP/1.0" 200 39784 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Chrome/12.0.742 Safari/534.51" www.sodahead.com 74.125.78.83 "GET /living/white-tea-natural-fat-burner-will-you-take-it-and-get-ready-for-the-summer/question-362433/ HTTP/1.0" 200 14260 "http://translate.google.com.eg/translate_p?hl=ar&#38;prev=/search%3Fq%3DWHITE%2BTEA%2BFAT%2BBURNER%26hl%3Dar%26biw%3D1024%26bih%3D634%26prmd%3Dimvns&#38;sl=en&#38;u=http://www.sodahead.com/living/white-tea-natural-fat-burner-will-you-take-it-and-get-ready-for-the-summer/question-362433/&#38;usg=ALkJrhhLw3GWfeKOfwKa0CK-pbsDlRuEXA "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.220 Safari/535.1,gzip(gfe) (via translate.google.com)" www.sodahead.com 74.125.78.83 "GET /living/do-you-think-too-much-about-death/question-1785829/ HTTP/1.0" 302 20 "-" "Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Version/4.0 Mobile Safari/534.51" www.sodahead.com 74.125.78.83 "GET /entertainment/kim-kardashian-boobs-too-big/question-2168867/ HTTP/1.0" 200 12680 "-" "Mozilla/5.0 (en-us) AppleWebKit/534.14 (KHTML, like Gecko; Google Wireless Transcoder) Chrome/9.0.597 Safari/534.14" www.sodahead.com </code></pre> UserAgents (Google "bots"):<p><pre><code> Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Chrome/12.0.742 Safari/534.51 Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Version/4.0 Mobile Safari/534.51 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Chrome/12.0.742 Safari/534.51 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.220 Safari/535.1,gzip(gfe) (via translate.google.com) mobile request against non-mobile site: Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/534.51 (KHTML, like Gecko; Google Web Preview) Version/4.0 Mobile Safari/534.51 Mozilla/5.0 (en-us) AppleWebKit/534.14 (KHTML, like Gecko; Google Wireless Transcoder) Chrome/9.0.597 Safari/534.14</code></pre>
评论 #3102216 未加载
mootothemax超过 13 年前
From the blog post:<p><i>The source of the requests is our client-side JavaScript error tracking code, which installs a global JavaScript error handler and attempts to POST to our server when unhandled errors are detected on the client.</i><p>Sounds like Googlebot's executing more advanced Javascript, though it's pretty scary it's allowing POSTs to go through.
评论 #3101251 未加载
lm741超过 13 年前
I noticed this a few days ago. I'm actually considering POSTing the screen dimensions and a few other browser properties for Googlebot via js.<p><a href="https://twitter.com/#!/lm741/status/122378906669023232" rel="nofollow">https://twitter.com/#!/lm741/status/122378906669023232</a>
评论 #3100732 未加载
评论 #3101444 未加载
DanielStarling超过 13 年前
This is a bit annoying. My company has an IRC bot that notifies us when someone fails to properly fill out an important AJAX POSTed form on our front page and soon found out all the "errors" we were seeing on IRC were generated by googlebot.
Bloodwine超过 13 年前
No crawler should perform POST requests. it is simply bad etiquette and it is understood that POST requests are typically used to create/change/delete content or affect the environment.
评论 #3101242 未加载
评论 #3101438 未加载
evan0202超过 13 年前
I wonder what this means for SEO? If they are actually fully rendering pages in javascript I guess that means you have to be a lot more careful about how pages are laid out.
评论 #3100829 未加载
评论 #3101033 未加载
threepointone超过 13 年前
As a corollary, does this mean that the googlebot now reads pages generated by javascript? I remember that you needed to follow their ajax guidelines, as well as generate the actual page on the server, but if they're able to run javascript on pages now, does this mean they let the page render first (or with a delay of some sort) before parsing it?<p>That would be cool.
nostromo超过 13 年前
I wonder if you could add your POST URLs to robots.txt if you don't want the crawler to access them.<p>If other crawlers start doing this, it should probably be added to robots.txt formally.
评论 #3101307 未加载
mariust超过 13 年前
According to google they can to some length read you ajax but you to follow some guides. <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=81766" rel="nofollow">http://www.google.com/support/webmasters/bin/answer.py?answe...</a>
consultutah超过 13 年前
That's a little spooky. I wonder how many people have AJAX posts for handling deletes?
评论 #3100602 未加载
mkopinsky超过 13 年前
Does it only do POSTs that are in the document's onload handler, or also things that are in onclick? I think that the latter could be dangerous<p>&#60;a href="#" onclick="$.post(etc)"&#62;Delete&#60;/a&#62;
评论 #3101757 未加载
eddieplan9超过 13 年前
Would it click facebook Like, too? ;)