Does Google crawl dynamic content?

165 pointsby pstadlerover 9 years ago

20 comments

We built our site, <a href="https://appapp.io" rel="nofollow">https://appapp.io</a> (a search engine for the App Store) as a one page app. It serves no dynamic content in html from the server, so we were unsure to what extent google would spider/index it.As far as we can tell, it makes no difference from if it was generated server side: <a href="https://www.google.com/search?q=site%3Aappapp.io" rel="nofollow">https://www.google.com/search?q=site%3Aappapp.io</a>So yes, Google definitely does index dynamic content. I would love to know if it ranks it equivalently.Also, Bing does not: <a href="http://www.bing.com/search?q=site%3aappapp.io" rel="nofollow">http://www.bing.com/search?q=site%3aappapp.io</a>(apologies for the minor self-promotion)

评论 #10695419 未加载

userbinatorover 9 years ago

I suppose this explains all the times I've seen a promising search result with the words I was searching for prominently highlighted, then visited the page to find what I was looking for is no longer there. Sometimes the cached, text-only version has it, and sometimes not. Alternatively, I'll see search results with none of the words I was searching for, yet perhaps they did sometime in the past. Rather annoying.

评论 #10694800 未加载

gPphXover 9 years ago

I have modified wikipedia pages, then googled it, to see search result "instantly" updated.Also, sneaky web sites often give different results to the googlebot user agent than to a non-google firefox user agent<a href="https://en.wikipedia.org/wiki/User_agent" rel="nofollow">https://en.wikipedia.org/wiki/User_agent</a><a href="https://addons.mozilla.org/en-GB/firefox/search/?q=user+agent&cat=all" rel="nofollow">https://addons.mozilla.org/en-GB/firefox/search/?q=user+agen...</a>

评论 #10695263 未加载

m0destover 9 years ago

I'd really love it if you repeated the same tests for Bing, just to get coverage. (Yahoo/Baidu would be the other big two.) Historically, Bing hasn't used fully functional headless browsers to crawl, which has limited its ability to index dynamic content like this.Google has "only" 70% market share, so it seems irresponsible to make engineering decisions without testing the others. Google+Bing+Yahoo+Baidu get you to 98%.

评论 #10695366 未加载

评论 #10702181 未加载

peterhartreeover 9 years ago

The post author writes:> So, very soon, the days of pre-rendering PhantomJs snapshots and serving shadow content to spiders will be over.To be clear: webmasters of sites with dynamic content should not celebrate yet. There are still influential spiders other than Google's that do not parse JavaScript (for example, Facebook[1] and Twitter[2]).[1] <a href="https://developers.facebook.com/docs/sharing/webmasters/crawler" rel="nofollow">https://developers.facebook.com/docs/sharing/webmasters/craw...</a>[2] Can't find an official statement on this, but <a href="https://twittercommunity.com/search?q=javascript%20crawl" rel="nofollow">https://twittercommunity.com/search?q=javascript%20crawl</a>

评论 #10696284 未加载

bigethanover 9 years ago

I'm curious how google strongly penalizes SPAs for being slow to load.The content may be indexed, but if your visitors are on a mobile network, that initial visit (or a visit with stale cache) is going to be crappy. It's great that they can read in they content (though bing cannot), but if it's buried on page two, does it even matter?As someone who is a proponent of web perf, these kind of articles make me worried that server side rendering will be ignored because "SEO works now for Javascript", even if it's slow and google is only 70% desktop & 80% mobile search.

评论 #10696170 未加载

jimrandomhover 9 years ago

My theory is that the Google crawler is a modified, headless version of Chrome. These results seem consistent with that hypothesis.

评论 #10695296 未加载

评论 #10695473 未加载

评论 #10695121 未加载

dyoo1979over 9 years ago

Probably relevant: <a href="http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html" rel="nofollow">http://googlewebmastercentral.blogspot.com/2014/05/understan...</a>

anarchitectover 9 years ago

Crawling JS content, yes. But does it rank for that content in the same way it would if the whole document was generated on the server?

captainmuonover 9 years ago

I wonder if you could use this to find information about the google crawler. Inject system and browser info into the page. Then you can find out what kind of browser engine it runs, with which settings etc.. If you wanted, you could use this information to do undetectable masking (I don't think it would work in the long run, though)It would be also interesting to see what timeouts it still allows. I wouldn't be surprized if the modified browser "virtualizes" time and runs window.setTimeout immediately. Maybe you could make a busy loop and find out what the real timeouts are. It seems there got to be some, otherwise this would open a way to DOS the crawler (not that I'd do that).

roboshakeover 9 years ago

Google may be indexing dynamic content now, but the question I'm curious about is how it affects crawl efficiency. I can't imagine indexing JS content is as efficient as indexing content returned from the original HTTP request.

评论 #10695924 未加载

olalondeover 9 years ago

Related comment from an HNer who worked on this at Google (from 2006 to 2010): <a href="https://news.ycombinator.com/item?id=9531344" rel="nofollow">https://news.ycombinator.com/item?id=9531344</a>

gildasover 9 years ago

Regarding SPA-based websites, as long as your site has only a few pages, these results are relevant. I would like to see the same kind of test on a site with 1000+ pages for example. I already did this kind of test in the past and it was failing miserably (i.e. only a dozen of pages were correctly indexed).

spyderover 9 years ago

The next test could be: Does google crawl hidden text (display:none, very small, very transparent colored text)? My guess is they do crawl it because it can have legitimate uses, but if there is to much of them on a page then they give it a lower ranking.

评论 #10695555 未加载

Sarkieover 9 years ago

This article is from May.<a href="http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157" rel="nofollow">http://searchengineland.com/tested-googlebot-crawls-javascri...</a>

andreasklingerover 9 years ago

I am sure that google "discovers" javascript/ajax content. They also mention this on their guides several times.But are there any experiments/results related to SEO impact/crawl frequency etc?

frikover 9 years ago

Offtopic: "Google search results on tablet"Recently Google changed their search result page for tablets. First it looked fine, and useful.But many times the first result page is now completely full of advertisements, only the second page now shows usual links to websites like Github, Wikipedia, Youtube, etc. of a common search term. Very annoying! And the Youtube link is broken on iPad (it tries to link to a non HTTP address). I am just unlucky to be part of an AB-testing?An news article about the changes: <a href="http://searchengineland.com/google-launches-new-search-results-interface-for-tablets-235340" rel="nofollow">http://searchengineland.com/google-launches-new-search-resul...</a>

mywacadayover 9 years ago

centrical.com is blocked where i work by mcafee web gateway due to GTI reputation identifing it as malicious and high risk

评论 #10696160 未加载

mcot2over 9 years ago

iirc this is why they originally started development on what is now Chrome.

largoteover 9 years ago

TL;DR: Yes, most of the time.