TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Google search and search engine spam

239 pointsby jsm386over 14 years ago

46 comments

mmaunderover 14 years ago
"we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content."<p>"As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. "<p>Looks like 2011 is the year that Google kills the scrapers. Look for an uptick in the sensitivity of the duplicate content penalty.
评论 #2128218 未加载
评论 #2128612 未加载
epsover 14 years ago
Metrics-shmetrics. Once I stop seeing StackOverflow clones listed above StackOverflow's original pages I will gladly believe that Google's search quality is "better than ever before."
评论 #2128207 未加载
评论 #2128036 未加载
评论 #2129343 未加载
abrahamsenover 14 years ago
Google have two strong incentives to weed out AdSense drivel sites in the search results.<p>1. They diminish the value of Google Search as an advertising platform. And Google Search is likely the most valuable virtual estate on the net. I more often click on ads in Google Search than I click on ads on all other sites combined. This is because when I'm on Google Search I'm actually searching for something, so I might click on a relevant ad.<p>2. They diminish the value of AdWords content network ads. People pay Google to display their ads because they believe they get better return for their money there than on the alternatives (Yahoo and Microsoft). Ads on low quality sites are unlikely to be competitive, so these sites decrease the relative value of AdWords.<p>That is, high-ranked low quality sites with AdSense are a double threat to the main source of income for Google, and I expect Google to make it their main priority.<p>Why, then, aren't they more successful? My guess: Because the problem is a lot hard than any armchair designer would believe. Problems tend to be a lot simpler when you are not the one who must solve them.
评论 #2128508 未加载
评论 #2128630 未加载
dpapathanasiouover 14 years ago
And yet Mahalo is still tolerated, somehow.<p>E.g., this query -- "travel agent vermont" (which I got from this post complaining about Mahalo spamming the web and Google not enforcing its own qc standards <a href="http://smackdown.blogsblogsblogs.com/2010/03/08/mahalo-com-meet-the-new-spam-worse-than-the-old-spam/" rel="nofollow">http://smackdown.blogsblogsblogs.com/2010/03/08/mahalo-com-m...</a>) <i>still</i> returns a Mahalo result in the top 10.
评论 #2128170 未加载
评论 #2128102 未加载
评论 #2128615 未加载
noiblover 14 years ago
<i>One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads.</i><p>That's not quite what I've been reading. I believe the more common claim is that Google has a disincentive to algorithmically weed out the kind of drivel that exists for no other reason than to make its publisher money via AdSense. It's about aggregate effects, not failure to clamp down on individual sites. Or, put another way, it's not <i>if</i> certain sites are serving Google ads, it's <i>because</i> that kind of content is usually associated with AdSense.<p>AdSense is definitely a problem for search quality. It creates the same imperative for the content farm as Google Search has: get the user to click off the page as soon as possible. And the easiest way to do that is to create high-ranking but unsatisfying content with lots of ad links mixed in.
评论 #2128638 未加载
jonpaulover 14 years ago
You know, sometimes it really makes me mad that it's difficult to get into contact with a person at Google for support of their products. But, I've got to hand it to them. They could have issued a non-personal public statement like most companies and signed it with "Google Search Team" But instead there is a personal touch. It's public statements like these that add just a bit of personal touch that makes people love them. My 2 cents. Entrepreneurs take note.
评论 #2128163 未加载
评论 #2128483 未加载
dustingetzover 14 years ago
Google has taken a lot of criticism on HN and elsewhere for an apparent perverse incentive, to direct searchers to content farms with adwords, instead of the original source (like StackOverflow or Amazon reviews).<p>I'm skeptical, because spammy-ad clickthrough rates are already low and trending lower, and I speculate google has great incentive to send people where they want to go lest their competitors get stronger.
评论 #2128119 未加载
评论 #2128194 未加载
snewmanover 14 years ago
To me, the most interesting aspect of this situation is the conflict between Google's view and the blogosphere's view. On the one hand, "...according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been...". On the other hand, you can't open an RSS reader today without tripping over someone griping about content farms polluting the search results. There are intelligent, thoughtful people on both sides of the debate. Why such disparate viewpoints?<p>As Matt's post suggests, it could simply be that people's expectations are rising -- search results are getting so good in general (which they are) that we notice the problems more. Or it could be that Google is focused on a narrow definition of "spam" that doesn't cover content farms. It could even be that both sides are "right" -- that overall search quality is rising even as the content farm problem worsens, if Google has been successfully reducing other causes of low search quality.<p>I'd love to see some hard analysis of this. For instance, pick some a reasonably large set of sample queries, and show what the results looked like five years ago, and what they look like today. Of course, you'd first have to find a set of sample queries and results from five years ago.
评论 #2129352 未加载
bmastenbrookover 14 years ago
Beyond the flood of SEO spam and Demand Media-style content mills, there's another search quality problem I have with Google: torrent sites. I will frequently search on the exact name of a song or album on Google in order to find out more information about that song or album, but lately most of the results have been links to torrents, including results on the first page. This applies even if I add "review" to the search query. I will even see links to torrents ranking above links to iTunes.<p>These songs and albums are not available legitimately through torrents. What value is there in providing links to pirated content? I understand that Google is not under any legal obligation to remove these results, but as a non-pirate these results are significantly lowering my perception of the quality of Google's search results.
评论 #2129616 未加载
theoreticalover 14 years ago
I'd be interested in a further explanation of "Google absolutely takes action on sites that violate our quality guidelines [...]".<p>Does that mean that Google manually decrease rankings of spammy sites that their algorithms haven't caught? Does this entail decreasing the rank of the entire domain, the IP? Does blacklisting ever happen?<p>I ask since Google have previously[1] said they don't wish to manually interfere with search results.<p>[1] "The second reason we have a principle against manually adjusting our results is that often a broken query is just a symptom of a potential improvement to be made to our ranking algorithm" - <a href="http://googleblog.blogspot.com/2008/07/introduction-to-google-ranking.html" rel="nofollow">http://googleblog.blogspot.com/2008/07/introduction-to-googl...</a>
评论 #2129377 未加载
powrtochover 14 years ago
Am I the only one who was really hoping for some specifics about what they're doing and plan to do about content farm rankings? Without that, the article is virtually devoid of content other than "we're really not so bad!"<p>Edit: By specifics, I don't necessarily mean implementation details, just anything more informative and plan-of-action than acknowledging the problem.
评论 #2128133 未加载
评论 #2128145 未加载
评论 #2128045 未加载
ericbover 14 years ago
While I applaud the direct personal response, I feel like the content says "we don't see a problem." If users see a problem and you don't, smaller competitors can eat your lunch. I'm kind of hoping for some competition in the field.<p>In terms of adsense, if you really think about it, adsense content on a page should probably be a slightly negative ranking signal (not just not a positive signal). The very best quality pages have no ads. Think of government pages, nonprofits, .edu, quality personal blogs, etc. If no one is making money off a page (no ads) then whatever purpose it has, it is likely to be non-spammy.
评论 #2128928 未加载
评论 #2129687 未加载
jpalomakiover 14 years ago
I believe it is very hard to implement algorithms that can make a difference between stackoverflow.com and a rip-off, or a legitimate Apache mailing list archive and a rip-off.<p>Why not allow the community to sort this out. "Google Custom Search" already exist. Google could extend that to the direction where people could customize the Google search to exclude certain sites from the results (right now it is only possible to specify a list of sites to include in the search).<p>Blacklists for at least specific "fields of searching" would emerge very quickly. People could select what blacklists to use, if any.
mmaunderover 14 years ago
As a pointer: Matt_Cutts is the head of the webspam team at Google. He's been very active on this thread. Please search below for his posts.
eitlandover 14 years ago
There are a few signals that should be possible to pick up. Examples:<p>- When power searchers start adding -somedomain.xyz to their searches<p>- Increase spam reporting by adding some kind of feedback to the spam reporting feature. I think I'd love to get an automated mail saying something like: "The site somespamdommain.xyz that you and others reported x days ago is now handled by our improved algorithms". Submitting spam reports really doesn't feel useful when it seems like nothing ever happens.<p>- Adding weight to spam reports. You know a lot about us, and I guess you can filter out who are power searchers. This could help stop people from gaming the system into blocking competitors.
jmountover 14 years ago
Google AdSense for Domains ( <a href="http://www.google.com/domainpark/" rel="nofollow">http://www.google.com/domainpark/</a>) really makes a lie of not wanting useless content. The designed a revenue source for parkers/squatters.
评论 #2129110 未加载
评论 #2129382 未加载
jonkneeover 14 years ago
The timing of this is interesting... About a week before Demand Media's IPO. Must be a bad day for the investment bankers.
bitskitsover 14 years ago
I think there might be something else at work here: our rising expectations of how search engines should work.<p>In years past, Google's results were measurably less relevant than they are today. In the time between "then" and "now", we're grown more accustomed to high quality, fast, relevant results. I think this makes it seem like small problems in search are bigger than they are.<p>It would be great if there was a "Google of 2004" to test this side by side, but I don't think that is possible. :)
mwilton13over 14 years ago
@Matt I like that you pinpointed some specific here, but is the algorithm going to be strong enough to easily pick up things like this: <a href="http://posterous.com/people/YrCushFlSet" rel="nofollow">http://posterous.com/people/YrCushFlSet</a> This single account is feeding 100 different websites alone that all feed hundreds of others. This is helping fuel numerous sites in the plastic surgery niche and it's quite disturbing.
评论 #2129696 未加载
gojomoover 14 years ago
Please, exile half or more of Demand Media's pages from the index <i>before</i> their imminent IPO!
tmshover 14 years ago
I'm just impressed with how this was handled. Consider how much more technocratic this is than a news release from ten years ago.<p>Google issues an announcement via blog post. TC and others start to pick it up. And the original author of the blog post takes questions and provides technical answers, where allowed, in HN.
vannevarover 14 years ago
Google cannot escape their fundamental conflict of interest: they make money by selling web traffic to advertisers, then buying that traffic back at discounted rates and re-selling it again, over and over. None of their revenue comes directly from search, though search is their primary source of raw traffic. Their search results don't have to be good, they just have to be good enough to sustain traffic. And right now there are so many people who reflexively use Google out of habit that their results could deteriorate substantially (and many would argue already have) before it impacts their shell-game revenue stream.
nowarninglabelover 14 years ago
Curious though what the metrics they use to evaluate effectiveness against spam are. It could have just as much (or as little) spam indexed as it had 5 years ago, and in some comparisons that would be valid, but what if much of the spam had moved from being evenly distributed throughout results to being distributed in the top positions? Then, one could say spam was even lower than ever in total quantity, but it would be even worse in terms of user experience.<p>That said, I agree with Google, users' expectations have skyrocketed, and it is tough to keep pace with them.
评论 #2128189 未加载
mssfldtover 14 years ago
Spam is not only in the organic search but also in the image-search. I observed a site that steals 140.000 (!) images by hotlinking (also some of my pictures). First the pages itself seems to be "clean". They only set hotlinks to blended search images, and they got a lot of traffic, that sure. Then they switched the site: on the top there are two porn-ads (it was xslt*us) I wrote a spam report and posted it an webmasterforum. But it took about 10 days until the site was removed. Hope this gets better... And: hotlinking is a great problem.
zone411over 14 years ago
First, I haven't noticed any significant increase in the frequency of spam sites appearing the Google search results. The biggest problem that I did notice is social bookmarking sites, like reddit and digg, outranking original content. They often have nothing more than a copied-and-pasted paragraph, sometimes supplemented by low quality comments (as is common with these types of sites). Since this site is very similar, not many people will be concerned about this issue.<p>Second biggest issue is poor Wikipedia articles appearing in the top results for almost any reference type query. Many less frequently updated Wikipedia articles are nothing but regurgitated content lifted from other quality sources. What makes it worse is that Wikipedia is using no-follow for their links, so even if these sites are linked in the reference section, they won't get any credit. It's interesting to see so many people complain about low quality content on commercial sites, but they never mention Wikipedia, which is a much bigger offender (I guess this might be because Wikipedia gets its content for free and doesn't have ads and other sites pay for the content and do have ads).<p>Third, I hope Google doesn't make any changes without checking very carefully that good sites will not be negatively affected. For example, newspapers will often have the exact same articles from the AP, but also original content based on their own reporting. Punishing them for having duplicate content would not work well. There are many similar possible pitfalls.
beefmanover 14 years ago
"The short answer is that according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness."<p>The long answer is that without Wikipedia results, Google's search quality would be at an all-time low in terms of relevance, freshness and comprehensiveness.
kellysuttonover 14 years ago
When a company needs to write a blog post in this tone, they are definitely losing ground.<p>What you are saying != how you are performing.
darksagaover 14 years ago
This is just lip service - Google's quality has dropped off and its really obvious. Recently, I've been regularly comparing the results I get from Google and the ones I get from Bing. Needless to say, Bing's far more relevant. The biggest point people have made are on the money searches like "MP3 Player" but the results I've been comparing have been local searches and programming things like: "show/hide text boxes in Javascript." In Google all I get is links to Amazon and other random results to link farms like javascriptworld.com. In Bing, I get links to forums and tutorials which is what I'm looking for.<p>Time and time again Google has failed. I've already moved on to Bing and Duckduckgo and I would recommend you do too. Unless you like digging through hordes of useless SERPS.
评论 #2134095 未加载
paul9290over 14 years ago
A recent Google search for the Walmart being built 2 miles away from me led me nowhere. Google listed 2 pages of job sites listing open positions at this wal-mart. I almost gave up my search, but decided to search twitter and in doing so I found what I was looking for!
评论 #2129678 未加载
Benvieover 14 years ago
For the love of god just give us the tools for effective persona blacklists. With Google's constant changes to the search site and the difficulty in efficiently and effectively monitoring live search results via browser extensions, it's been at best hit or miss. Whether that comes in the form of some API that can be tapped to make a good extension or having it built into the browser, I don't care.<p>Google of all companies I would have thought would understand and respect the important of giving people the power over their own technology experiences.
spiffworksover 14 years ago
I can't help but think that this is almost an exact parallel of the iPhone antenna problem. Both companies had minor to medium problems in their flagship products, both problems were vastly overblown by the media, both problems spurred an unbelievable spate of batshit conspiracy theories, and to take the cake both companies responded with the same "This is not a problem, but here's the solution." Good problems to have.
antirezover 14 years ago
What sounds odd of all this is that I think the spam sites and "content farms" are generating a lot of ad clicks for Google. Will they really take appropriate actions against this sites if this will mean a significant cut on the earnings?<p>I played with adsense a lot in the past, and if you did too you should now how spam sites generate a lot more clicks than sites where the user is actually focused on reading content...
tomotomoover 14 years ago
Why don't we collaboratively blacklist or push down domains from our Google results? This could be a stopgap measure until Google incorporates such a feature including the collaborative database or magically puts an end to spam.<p>A proposal: <a href="http://www.saigonist.com/content/google-spam-content-farm-filter" rel="nofollow">http://www.saigonist.com/content/google-spam-content-farm-fi...</a>
jacoblylesover 14 years ago
In other industries, the government regulates that there must be a "Chinese Wall" (no communication or shared personel) between segments of a company that have conflicts of interest. In this case, search and advertising qualify. I expect to see proposals for this kind of regulation in the next 10 years as the FCC begins regulating the internet.
评论 #2128831 未加载
PaulHouleover 14 years ago
This reads just like the nutrition label on Nestle products: the ones that boast that the product has "3 vitamins AND minerals", and, on a distant part of the package, lists two minerals and one vitamin and what the health benefits of them are.<p>They'd be a lot more credible without the corpo-speak junk in the first paragraph.
评论 #2128231 未加载
mbestoover 14 years ago
What are people's thoughts on companies who do create content farms? From the perspective as being a successful company rather than "I hate the spam and I hope they all DIAF".<p>Personally I think any type of "scheming" in technology will eventually get caught and then all of a sudden there goes your business model.
评论 #2129130 未加载
rhwd2003over 14 years ago
@Matt_Cutts starting around page 4 results for the term [loans] what is with all the .edu sites? This seems to be an error on the ranking side of things for all these .edu sites for a financial related term such as [loans]? All the results are .edu after page 4?
评论 #2129682 未加载
jrmgover 14 years ago
The problem isn't just rank though. After I've seen the original source of duplicated content, I don't just want sites that copy it to rank below it, I want to not see them at all, so that the rest of my results are filled with /different/ things.
fzk390over 14 years ago
I also have one spam site example. <a href="http://www.google.com/search?q=internet+phone+service" rel="nofollow">http://www.google.com/search?q=internet+phone+service</a> Look at 3rd result for internetphoneguide.org
franksinatra2over 14 years ago
I was hoping to see something in the latest blog post from Google about spammers using Exact Match Domains to get huge algorithmic boosts. EMD + tools like xrumer usually = first page results :(
Darin81over 14 years ago
What about sites like ezinearticles? Where they do not pay for content, isn't that more like organic content from experts? They are not content machines.
dcdanover 14 years ago
What is the technical difference between Google being able to accurately measure the volume of spam, and Google removing the spam?
huertanixover 14 years ago
tl;dr: Our algorithms already stop resultspam. Y'all are trippin'.
klbarryover 14 years ago
I just want to thank Matt Cutts for always being classy. His blog posts/comment are always just set at a high bar.
ddemchukover 14 years ago
It doesn't appear that Cutts &#38; Co. are looking to address any of the more popular blackhat link building methods that all popular SEO bloggers continually say "work but you shouldn't use them yourself because they're bad".<p>Until the keyword "buy viagra" isn't littered with forum link and comment spam and parasite pages, Google's algo is still not "fixed"
评论 #2128160 未加载
评论 #2128151 未加载
评论 #2129536 未加载
评论 #2128432 未加载
hessenwolfover 14 years ago
According to our metrics we are great; pity-about-you, unless you can "Please tell us how we can do a better job."<p>I expected more. It reads like content farm.
评论 #2128103 未加载