Tell HN: A case of negative SEO I caught on my service and how I dealt with it

278 点作者 santah超过 4 年前

Recently, my service <a href="https://next-episode.net" rel="nofollow">https://next-episode.net</a> experienced a huge drop in Google rankings. As I've been running it for more than 15 years, this is far from the first time this has happened. Usually I've been able to attribute big fluctuations (positive or negative) either to something I did, a Google algo change, or some external factor.For example, about 2 years ago, something similar happened. While digging through my Search Console I discovered that Russian websites generated thousands of links pointing to a page on Next Episode with pornographic keywords used as link anchors. This was so effective that they managed to get those keywords to the top of the "Top linking text" in Google Search Console - naturally (most likely) resulting in drop in rankings for the regular keywords and the domain in general.About a week ago, while trying to investigate the current drop in rankings and browsing through my "Latest links" external links export from Google Search Console, I noticed something funny. There were thousands of links in there (from 3 domains) following the same structure as on Next Episode: domain/show-name domain/show-name/browse domain/show-name/season-1, etc.Following these links revealed something even funnier: all of them displayed content directly from my site! Not even scraped/cached content - they were dynamically pulling content from my server and displaying it on their domain. Even the search worked, the news archive and the top charts. Here is a list of those domains as an image: <a href="https://i.imgur.com/PjNKh0b.png" rel="nofollow">https://i.imgur.com/PjNKh0b.png</a>. I've since blocked their access, so opening any of them will not show my website right now, but here is how it looked: <a href="https://i.imgur.com/HBiL3yh.png" rel="nofollow">https://i.imgur.com/HBiL3yh.png</a>Now, my first thought was that those were maybe scraping the content as part of a link farm (to spam with ads?), but I also wanted to know more. I experimented with Google searches that included pages from my website, like "Hot Shows - Next Episode" and ones with very specific news posts subjects like "Streaming Services Availability added to Episodes and Movies" (posted in September last year). Imagine my surprise when I discovered that not only the domains above were indexed by Google (and were listed in the Search results), but there were 4-5 more domains that did the same thing and some of them even outranked mine!Here is a full list of domains that I discovered by searching for my news posts subjects: <a href="https://i.imgur.com/dAm1CzI.png" rel="nofollow">https://i.imgur.com/dAm1CzI.png</a>. If you Google for site:domain.com you'll see some of them have thousands of pages indexed by Google. Trying out more keyword searches, I was also able to discover these domains: <a href="https://i.imgur.com/s5YjJWK.png" rel="nofollow">https://i.imgur.com/s5YjJWK.png</a> (as they've cached the content, they still work). Those all seem to be part of the same operation, but they serve a different purpose - they have only scraped the home page of Next Episode and all their links point to inside pages on the other domains. I suspect this is to generate incoming links to the other domains and give them some credibility.As with the links with adult keywords text anchors mentioned above - I suspect this whole thing is a negative SEO campaign - I don't see any other reason for it to be happening and it seems to be achieving its goal. Once I found all I could find about the domains involved in this, I took some action:1) disavowed all those domains through the Google disavow tool2) investigated if I could redirect their pages to mine (as they were dynamically pulling the content - I could change it to whatever I wanted). I managed to make it work through JavaScript (though interestingly, it had to be obfuscated as they were doing some sanitizing when pulling my content and replacing strings like "window.location.href" with "window.loc1ion.href"), but in the end I decided against it and:3) I blocked their IPs through CloudFlare (all Russian IPs). An interesting thing here is that once I blocked an IP, the domain would somehow automatically switch to another IP to pull my content from, but once I blocked like 10 or 15 of them - they seem to have run out of IPs and now they stay blocked.I looked for a way to report those domains to Google, but as of today, I've not found the place to do it. Does anybody know? Today, about a week after I blocked the domains that pulled content from my site, they still have thousands of my pages indexed in Google and are ranking better in some search results than me. I'm guessing with time, Google will catch up with the fact they don't show any content anymore and will delist those pages.This whole thing was very new to me so I hope it'll raise awareness that this is going on and maybe help someone else catch it happening to their website. I'd appreciate any feedback on this and I'm around if you have any questions. It would also be interesting to hear about anyone's related experiences. Cheers!

16 条评论

Matsta超过 4 年前

I'm sorry to say, but the neg SEO didn't drop your rankings, it was to do with the Google algorithm update [1]. Check the screenshot from Ahrefs [2], and your traffic drops on 3rd of December which is when the update went live. [1] <a href="https://moz.com/blog/googles-december-2020-core-update" rel="nofollow">https://moz.com/blog/googles-december-2020-core-update</a> [2] <a href="https://i.imgur.com/DBkdUEk.png" rel="nofollow">https://i.imgur.com/DBkdUEk.png</a>Google's algorithm is smart enough to recognise Neg SEO attacks. Sure five years ago you could buy a blast of spammy links using Xrumer or GSA with some viagra anchor text and boom you're competition is gone.From a quick glance, most of your pages have pretty thin content, and I assume it's pulling from an API, so none of it is unique. If there was one thing I would do is try to build some content on pages. A great tool to analyse and develop content that is SEO friendly is SurferSEO - highly recommend it.I'm surprised your forum doesn't rank as well as your main site as it looks fairly active. However, I'm not sure about how PunBB does SEO wise.

评论 #26111102 未加载

评论 #26109299 未加载

评论 #26109029 未加载

评论 #26108834 未加载

评论 #26111268 未加载

评论 #26122082 未加载

javajosh超过 4 年前

May I just say kudos, sir, for dealing with this situation with such aplomb. It is easy to imagine an alternative response, with far more anger and less curiosity. You are like a doctor looking at a disease: "Ah, look at this awful thing happening, how interesting!"Also, given the way they were using your site, effectively reverse-proxying you and adding ads, it implies that you have access, in your server logs at least, to all of their traffic! And that might give you insight into their motivations, and maybe other elements of their operations. I mean, it sounds like a reasonably clever, small scale scam operation in Russia; but if they proved out the technique with your niche site, then they can easily duplicate with other sites, in which case it is effectively a new kind of malware that has to be solved by Google!Last but not least, I wanted to encourage you, and others, to consider whether this kind of attack would work in a decentralized world, what search looks like in that world, and therefore how this kind of attack might be mitigated.

评论 #26111865 未加载

santah超过 4 年前

Update: After a week of doing nothing - they finally noticed their thing is blocked and sprang into action.Apparently, they expanded their pool of available IPs they pull data from and now they seem to be endless (so some of the scraping domains actually work now).I'm investigating what I can do about it. I'd appreciate any advice!

评论 #26111700 未加载

评论 #26111175 未加载

评论 #26111206 未加载

评论 #26111478 未加载

arn超过 4 年前

Browsing through your SEO results. I also don't think the negative SEO is necessarily what did you in.You have a very straight forward value prop. "Next episode" of some-show. I think these sort of optimized results are probably things that Google has been algorithmically adjusting for.Looking at the Ranked 1-3 terms you dropped for, it seems you dropped some pretty big terms and even keyword terms.You were #1 for "seal team next episode", but now you rank #3. #1 got replaced by CBS's page, which is arguably a better result."black clover new episode" also dropped from #1. Replaced by Wikipedia."the good place next episode" similar story.I don't know what the best move is here. Algorithmic changes are really hard to combat without major changes and even then, you don't have a ton of room to wiggle with next-episode content.

throwaway13337超过 4 年前

Wow. That's absolutely horrible.Looking at Google's search results, it's obvious that these tactics are rampant and really winning the war here.We need a new search engine that cannot be gamed so easily. I know it's non trivial but the stakes are high as is the reward for making such.This is a real engineering challenge. I'm excited about the problem space and opportunity.

评论 #26108540 未加载

评论 #26108078 未加载

评论 #26108708 未加载

评论 #26108303 未加载

pilferz超过 4 年前

Made an account here just to make this comment: You're going to want to send DMCA notices to both the registrar(s) AND Google.1. Compile a list of domains and sitemaps that are 100% stealing and mirroring your content.2. Go to Google's DMCA request page: <a href="https://www.google.com/webmasters/tools/legal-removal-request?complaint_type=dmca" rel="nofollow">https://www.google.com/webmasters/tools/legal-removal-reques...</a>3. Fill out all relevant data, and submit the offending domains and URL's.Wait a few days, and you'll be happy to see that those pages are blocked from Google entirely. Not many people know what to do when Google DMCA's them, so it could solve your problem permanently (or you can automate it).Regarding physically blocking them from scraping your site, you've got a few options. Put Cloudflare up if it isn't already. They've got at least one anti-scraping application (Scrape Shield) that may help.Another thing you can do is automate the scraping of their websites using distinct query parameters and try to exhaust their list of proxies by automatically logging and filtering them. This might be a fruitless endeavor if they're using rotating residential proxies though.Hope this helps, and good luck!

评论 #26122100 未加载

juanani超过 4 年前

Sucks to see this. I think I even mentioned your site on here just this past week.Didn't think I'd see the author but since you're here, thanks, this has been my go-to over the years.

评论 #26112434 未加载

clscott超过 4 年前

It's an interesting story, I won'der if you could turn the automation trick around on them.Would you be able to make them do the same negative SEOing but to their own site?Fill their site with unrelated garbage and internal links with undesirable anchor text.* unbock their IP * create content that links back to their site with the undesirable keywords * only show this content to them and not regular visitors * don't let them grab much / any legitimate content

评论 #26109173 未加载

stickfigure超过 4 年前

Earlier today in another thread we were joking about GaaS (Goatse as a Service) but now maybe I think that's not so crazy an idea after all.<a href="https://news.ycombinator.com/item?id=26104087" rel="nofollow">https://news.ycombinator.com/item?id=26104087</a>Your YC application practically writes itself.

melomal超过 4 年前

Also, another update has been noticed for V-day celebrations: <a href="https://www.seroundtable.com/google-search-ranking-algorithm-update-30898.html" rel="nofollow">https://www.seroundtable.com/google-search-ranking-algorithm...</a>

slig超过 4 年前

This is really frustrating, thanks for sharing. Google has had decades to figure out a way to detect duplicated content, spammy sites with this structure random-spam-keyword.spam-site.xyz/more-spam-words.html and the problem seems to get worse every year.

stanislavb超过 4 年前

I feel you. First how bad that feels and second, the amount of time you need to spend in fighting these things :/

评论 #26113757 未加载

tester34超过 4 年前

Of course RussiansThat happens when it is legal to hack/steal/cause damage to people from other countries

devlopr超过 4 年前

Is all of your content pulled via javascript? Could a server side language prerendering the content be part of your solution. You can still use javascript for everything else just not the content.

thetinguy超过 4 年前

Dmca the registrar.

BryanBigs超过 4 年前

Was/is your canonical set correctly?

评论 #26109116 未加载