Hey HN folks,<p>A few months ago I received a manual action penalty from Google as they detected spam pages on our domain. The problem was that when people were searching on our site they are directed to a page with the following:<p>https://$domain/search?query=$QUERY<p>Some users (most likely bots) are generating huge spam searches on our search page and somehow Google is indexing these and there are no inbound links to these pages (at least I cannot find any).<p>To resolve this I did the following:<p>* On our search page I set the following header: X-Robots-Tag: noindex (based off of the documentation here https://developers.google.com/search/reference/robots_meta_tag).<p>* Submitted URLs to be dropped from Google Index via Webmaster console<p>* Submitted 3 reconsideration requests to Google to avoid the penalties<p>In theory this should stop all search pages being indexed (as they all contain the noindex header) and it has helped drop the number of indexed pages marked as spam by 99% however we still have a significant number of urls marked as spam and so our site has a penalty from Google.<p>Has anyone had this issue before? How can I stop these pages becoming indexed when I have the noindex header set _and_ if you search the spam urls there are no inbound links to them?<p>Any help appreciated folks!
Hilarious how Google thinks they are now in editorial control of your content to the point where you are on the hook for fixing <i>their</i> bugs. You're being treated as a wayward content provider, rather than that they should be happy to get the benefit of your content to index.
You need to add <meta name="robots" content="noindex, follow"> to the <head> section of all your search results pages.<p>You want robots NOT to index pages but to still follow links on your search pages.<p>Create clean sitemap.xml file and submit it to Search Console.<p>Another way is to just canonicalize all search results pages to your search page.<p>With Google and these things time is involved. Once it's in the index it will take time to properly clean everything up. How was the traffic before this happened? Did the website rank for any decent keyword? Sometimes when this happens the smart thing to do is to just start from scratch with a new domain.<p>If you want more extensive help email me.
Based on my experience:<p>A.- I would also add "nofollow, noarchive" tags [1] to your X-Robots-Tag header:<p>- "nofollow" -> do not to follow (i.e., crawl) any outgoing links on the page.<p>- "noarchive" -> prevents Google from showing the Cached link for a page.<p>B.- I would specify in Search Console (former Webmaster Console) how should Google handle "query" parameter [2]<p>C.- Prevent those spam searches by blocking source IP address, User-Agents, combinations of both, etc.<p>Good luck!<p>[1] <a href="https://support.google.com/webmasters/answer/79812?hl=en" rel="nofollow">https://support.google.com/webmasters/answer/79812?hl=en</a><p>[2] <a href="https://www.google.com/webmasters/tools/crawl-url-parameters?hl=en&siteUrl=https://<domain>/" rel="nofollow">https://www.google.com/webmasters/tools/crawl-url-parameters...</a>
You should use the canonical tag. Moz has a good page on how it works.<p><a href="https://moz.com/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps" rel="nofollow">https://moz.com/blog/canonical-url-tag-the-most-important-ad...</a>
You could also annotate your page. <a href="https://schema.org/SearchResultsPage" rel="nofollow">https://schema.org/SearchResultsPage</a><p>Edit: Maybe it is also worth annotating the search field (<a href="https://developers.google.com/search/docs/data-types/sitelinks-searchbox" rel="nofollow">https://developers.google.com/search/docs/data-types/sitelin...</a>) so that google can match it against your search results page.
Register for Google Webmaster tools. There's an option in there to exclude links that have dynamic parameters. You can define the parameters you want it to ignore.
Adding <meta name="robots" content="noindex" /> to each page should work. Also as a heads up, having an entry in robots.txt to disallow is not enough since pages can still be indexed if they can be navigated from anywhere else on the web.
Can anyone answer a related question: Are you penalized for <i>not</i> running Google Analytics and/or Google Webmaster tools? In other words, if you have a clean website with no analytics whatsoever, is your ranking likely to be worse?
Heh, I ran into a similar issue previously: <a href="https://news.ycombinator.com/item?id=16302821" rel="nofollow">https://news.ycombinator.com/item?id=16302821</a><p>GoogleBot is broken.