TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Google penalizes original content site because of scrapers

136 pointsby raphaelbabout 14 years ago

11 comments

clintavoabout 14 years ago
Google's latest algorithmic changes seem to be either horribly wrong or not fully baked.<p>Our site went from ranking #8 for our target search "artist websites" to PAGE 440 of the results. Our listing for "how to sell art" just went away. There's been nothing but original content on our site for 10 years, and, among artists, we're considered one of the best sources of art marketing information, given that I owned an art gallery for 20 years and all of our other writers are professional artists. (and yet Google still has <i>ehow</i> ranked for "how to sell art".....yeah, I'm sure ehow knows a <i>whole</i> lot more than we do).<p>I'm saying this not to vent, but to concur with Aaron and others that there is something wrong. It may not hurt Google's business....the latest algorithm's probably improve adsense revenue, and that's fine, it's their business. Fortunately I've read HN long enough to know not to build my entire company on top of someone else's platform and, as much as it upsets me, we don't need Google. Bing (and Yahoo) have us at #3 for that same search ("artist websites"). We don't depend on search engines as our only source of marketing leads....nor even our main source.<p>The most frustrating thing is not even that it happens, but that they do not communicate. There's no way to find out WHAT happened. Nothing in Webmaster Tools. No way to pay for search support. I read the Google blog post with guidelines on how to structure content after Panda and, none of that applied to us, at least not that I could tell.<p>They say "just focus on users" and that's what we do, but I guess, that's BS.<p>I, frankly, think Google's gotten to big for their britches and as unlikely as it is to happen, I hope Bing, Blekko and yes, DuckDuckGo take some market share away. Windows is better for having OSX and Linux to compete with. Maybe Google would be a bit less "evil" with more competition too.<p>Sorry for the bit of the rant, I'm usually only a lurker here, but this article of Aaron's really hit close to home this week. At least there are a couple of relevant points buried in my little rant....I hope ;-)
评论 #2542957 未加载
tristanperryabout 14 years ago
Just like the anti-content farm blog posts previously, I'm half hoping that the internet community turns its attention to the problem of content scrapers (in the hope that Google take more action against the problem). I am a fan of Google - and the biggest source of traffic to my websites is Google search traffic - although the issue of scrapers does seem to be growing (even despite the attempted anti-scraper Google algorithm update earlier in the year).<p>A couple of days ago I did some searching online and found that a fair number of websites had copied some of my articles in their entirety. And sadly, a lot of these 'websites' were actually Google Blogger (Blogspot) blogs. And whilst some of these copied articles weren't appearing in Google search (I guess since the entire site contained copied/scraped content, thus giving them a Google SERP penalty?), some of the copied articles were appearing in the SERPs. And a couple of these websites even had Google AdSense on them.<p>So there was the crazy situation whereby my content had been stolen/scraped illegally, and put on a Google Blogger blog with Google Ads on it, and (in some cases) that blog then received traffic from Google Search. Hrmph.<p>In the interest of balance, I will point out that I filed Google DMCA requests after finding these scraped articles, and Google did promptly reply (a non-automated reply around 30 hours later, which is quick considering how many DMCAs Google must get).<p>They only removed the individual blog post (and not the blogs overall, even though they were clearly spam blogs), but nonetheless I am happy with Google's quick response.<p>I just wish that content scraping isn't (in some cases) a profitable endeavor..
moultanoabout 14 years ago
The headline is factually inaccurate. It looks like a mistake that he was denied access to Adwords due to original content, but that has no connection to his ranking in search. It looks like the adwords representative may not have access to fine-grained enough tools to assess the site accurately, which is an organizational failing, but there's no bad intent there. In some sense it reflects how disconnected search and ads are from each other that they're using crude tools to assess original content.<p>Google cares a great deal about putting the original source of a piece of content first. If we're doing that incorrectly, it's because we screwed up, not because that's how things are designed. It's a hard problem and an area we are still working on intensely. It would be great if someone involved could post the queries on which we are screwing up so we can debug what's causing it.
评论 #2542418 未加载
评论 #2542353 未加载
评论 #2543086 未加载
评论 #2542416 未加载
staunchabout 14 years ago
Is Google doing anything to solve the content duplication problem?<p>It seems like a solvable problem. Why don't they let webmasters implement some kind of time-based cryptographic signature?<p>It seems so lame that his problem has gone on so long, especially when there must be some kind of technical solution.<p>For real businesses spending a few days implementing some authentication protocol would not be particularly burdensome.
评论 #2542255 未加载
评论 #2542314 未加载
评论 #2543235 未加载
评论 #2542260 未加载
rmasonabout 14 years ago
Here's a simple idea that could fix a lot of this problem. Copy Twitters idea of verified accounts.<p>Google could issue verified sites. If someone copied a verified site the content would be automatically removed from the index.<p>Now they would have to hire some staffers to research the applications and handle complaints. But this is beginning to cost them far more than adding a few more staffers.
评论 #2542512 未加载
raganwaldabout 14 years ago
I'm sure this is a <i>gross</i> oversimplification, but Google is in the business of monetizing people's interest in content it doesn't create. Who will it perceive is the better partner for that monetization, the scraper who understands how to apply Google's tools to maximize monetization, or the original content author?
评论 #2542170 未加载
评论 #2542238 未加载
patrickjabout 14 years ago
I'm the guy whose site this is all about - iPadInsight.com. I only used that specific search in the forum thread because I discovered that was the single point on which the Adwords reviewer had judged that my site didn't produce original content. I sent back the results of the same search showing that links were either from legit aggregator sites (like alltop.com) linking back to my original review, or from a number of scraper sites that rip my content. Even when the review was overturned for Adwords I was told it would leave a black mark against my site because it had already been marked that way. Great system.<p>Soon after all that hassle, my site suddenly lost 60% of its traffic. From what I can gather, mine is one of the quality sites that produce original content that has been mistakenly penalized in the Panda / Farmer / Whichever Other updates.<p>Among the reasons I say my site is a quality site that produces original content, in accordance with this post at the Google Webmaster Central Blog (<a href="http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html" rel="nofollow">http://googlewebmastercentral.blogspot.com/2011/05/more-guid...</a>) and with all the logic I can apply to the subject, are:<p>-- The site contains over 1,700 posts published in the last 15 months. I wrote around 1,550 of them myself. The remainder are written by three other occasional authors, who are colleagues and friends of mine. There's no 'outsourcing' of content creation or anything of that ilk.<p>-- I spend tons of hours every day researching and writing the content that appears on my site. Every app review on the site is 100% original content (<a href="http://ipadinsight.com/category/ipad-app-reviews" rel="nofollow">http://ipadinsight.com/category/ipad-app-reviews</a>), as are all posts published.<p>-- I do consider myself an expert on the subject my site covers - the iPad. I have been writing app reviews,accessory reviews tips, how-to posts on it ever since it launched. I've appeared on ABC World News and numerous radio programs as an iPad and Apple expert. I've been a contributing author for iPhone and iPad Life magazine (printed publication) since their debut issue - writing expert tips and tricks posts, buyer's guide articles, and more. I'm listed in Robert Scoble's Twitter list of best tech people to follow. Blue-chip app publishers and accessory vendors approach me to write about their products. The Daily (the first iPad only newspaper) contacted me before their app even hit the App Store, as do many leading publishers. I've been a beta tester for many top iOS apps for years. I participate regularly at several leading iPad and iOS forums. I'm not saying any of this to boast, but in an effort to establish that I'm a blogger who is enormously passionate about the subject I cover, and someone who is respected in the area (mobile tech) that I write on.<p>-- My site is a long-standing member of the Got-OATS group of sites (<a href="http://www.gotoats.org/" rel="nofollow">http://www.gotoats.org/</a>) that seek to uphold and promote the highest ethics in app reviews. We never accept money for reviews or coverage, and add disclosure statements to our reviews to indicate whether we received a promo code for an app reviewed, or a sample unit of an accessory reviewed.<p>-- I spend a lot of time on every single post, on researching, on testing apps and whatever else I'm covering, on ensuring that spelling and grammar are spot-on, on providing good screencaps of apps in action, and every other detail I can think of.<p>-- I use a great cache-ing plugin on my site and do my best, with help from a few Wordpress experts, to keep the site fast and clean.<p>-- I currently have close to 4,500 RSS subscribers and over 3,000 Twitter followers for the site's account.<p>-- Before my recent sudden traffic fall off a cliff due to Panda, my site had around 80-100,000 unique visitors per month.<p>As for search results and scraper sites, I am still often seeing horrendous spam sites ranking above me for recent posts. Here is just one quick example on a recent post I wrote about iPad rivals, where several scraper sites rank above mine, including one (ipads101.com) which I have submitted 3 spam reports on via Google Webmaster over the last two months, and had zero response:<p><a href="http://www.google.com/search?sourceid=chrome&#38;ie=UTF-8&#38;q=ipad+rivals+the+year+of+the+clueless" rel="nofollow">http://www.google.com/search?sourceid=chrome&#38;ie=UTF-8&#3...</a><p>I run a good site. I pour hours of effort and my heart and soul into it. And I think it has been very wrongly assessed by whichever new algorithm.
评论 #2544533 未加载
评论 #2542840 未加载
评论 #2543061 未加载
blauwbilgorgelabout 14 years ago
I believe Panda also looks at originality, content freshness, document authority, trust factors, usability factors, site authority etc.<p>I'd agree that the denying of Adsense was (obviously?) wrong, if this is all there is to the picture. As for looking at ipad information on the internet, after a manual inspection of that site, I, as a user that cares for quality and relevance, don't need any of the results on justanotheripadblog.com in my top 100.<p>The order of relevance, discovery and editorial quality seems to flow from:<p><a href="http://reviews.cnet.com/8301-19512_7-20023976-233.html" rel="nofollow">http://reviews.cnet.com/8301-19512_7-20023976-233.html</a><p>&#62;<p><a href="http://ipadinsight.com/ipad-tips-tricks/how-to-make-airprint-work-with-just-about-any-printer/" rel="nofollow">http://ipadinsight.com/ipad-tips-tricks/how-to-make-airprint...</a><p>&#62;<p><a href="http://www.info4arab.com/how-to-make-airprint-work-with-just-about-any-printer/" rel="nofollow">http://www.info4arab.com/how-to-make-airprint-work-with-just...</a><p>With a lot of intermediate steps.<p>iPadInsight.com is not a cheap scraper site, but is it a site that does original research, beyond rehashing what is hot in the industry? I think Panda might have judged correctly in not assigning higher rankings to this site.<p>The site seems to have had a canonical problem with the comments in 2010, inflating the site size in index to * 10. The depth of these comments is usually not much more than: "Great! Interesting Article! Love this! Thank you!" and might just as well have been auto-generated.<p>Also the shareasale footerlink "Thesis Theme for WordPress" alone, might disqualify you for running Adsense, as you dofollow an affiliate link (and this is not allowed in the webmaster quality guidelines).<p>The trademark inside domain name might be another issue.
Alex3917about 14 years ago
The same thing happened to us as well. When I did an event last month our page rank dropped from 4 to 2 despite getting 12+ new links, which I strongly suspect is because all of the bloggers who link to our conference site are having their blogs duplicated by content farms. Overall our page rank has dropped from 7 to 2 in little over a year, despite having 10x more inbound links. (And zero SEO or anything else that would violate Google's best practices.)<p>I did ask a Google employee, who said it was because we weren't using canonical tags, but this doesn't make much sense and fixing this doesn't seem to have done anything to improve the situation.
评论 #2542538 未加载
antimatter15about 14 years ago
I don't understand the incentives for google to deny AdWords to someone. AdWords is what ultimately gives Google the profits, not AdSense. Google already has plenty of places for you to see ads, and AdWords is the product that actually takes money acquired through other industries and funnels it into Google.<p>I really don't think it's being done of malicious intent. I think it's very likely that it's just being done because of negligence, since service/app reviews happen to be frequently scraped.
评论 #2544597 未加载
评论 #2543970 未加载
评论 #2543969 未加载
pixcavatorabout 14 years ago
Hell with Facebook’s replacements; it’s Google who needs a replacement! (Note: I am currently thinking about possible alternatives for PageRank.)