<p><pre><code> > If you're removing code or changing an endpoint,
> be careful you don't screw the Google bot, which
> might be "viewing" 3-day-old pages on your
> altered backend.
</code></pre>
An interesting proposition. Personally, unless I was operating in some sector where keeping Googlebot happy was key to staying competitive and there was solid evidence it could hurt my page rank, I don't think I'd be prepared to go to this length. Google is doing quite an atypical thing here compared to regular browsers and I'd like to think Google engineers are smart enough to account for this type of thing in the early stages of planning.<p>They have a difficult cache invalidation problem here. The only way to find out if the Javascript in use on a site has changed is by checking if the page HTML has changed. And on top of that, the Javascript can change without any noticeable change to the HTML.
Googlebot also does some other crazy stuff. Like looking at url patterns and then trying out variations.. they're almost trying to sniff URLs!<p>For example if I have a page:
www.domain.com/xyz/123<p>Googlebot (without any links to other pages, will actually try URLs like)
www.domain.com/xyz/1234
www.domain.com/xyz/122
www.domain.com/xyz/121
and so on...<p>It's crazy how much 'looking around' they do these days!
I'm not too surprised. I've got Googlebot still requesting old URLs even through there are no incoming links to them (that I know of) and they've been either 404 or 301 redirected for six months. I even tried using 410 Gone instead of 404, but it made no difference.
Your users may be, too. It's not unusual for me to open my sleeping laptop several days later and expect the open web pages to work without refreshing them.
I wonder if it is Google's visual site previews/thumbnails that you get when you click on the arrow at the side of a search result, that are doing this.<p>Perhaps Google fetches the crawled page from the cache and then renders that for the previews?
Is this surprising? I'd expect the possibility of this sort of behavior from any system that was vaguely Map-Reduce-y and operated on the scale of data that Google's indexing does.