TechEcho

hi HNers,<p>I thought I had a decent idea for automatic website scraper removal on a previous story, (http://news.ycombinator.com/item?id=2541853) but I commented quite late so I suspect nobody really saw my clarified comments. Feel free to downvote/flag this submission if it is inappropriate/spammy to submit a comment.<p>What if a content creator could push a content publication notice to Google, and a list of domains think will scrape it (and content creators seem to frequently know their frequent plagiarizers). Google can then immediately index the scraper sites, and then index again after some interval (maybe a day or a week say).<p>Assuming that with the advantage of having an immediate notification, Google can index the scraper sites before the scraper sites can poll for the original content and duplicate it, proof of scraping occurs if the first index shows no content, and the second index shows a sufficient amount of exact matching content (obviously doesn't catch people who bother to paraphrase original content - but can defeat automated copying; and amount of copied content would have to be large enough to allow for legitimate quoting), and Google could then penalize accordingly (lower ranking, adword ban or whatever).

Ask HN: Would this be a useful automatic scraper-finding algorithm?

no comments

Ask HN: Would this be a useful automatic scraper-finding algorithm?

no comments