I think this is pretty fair on Google's part. How could you possibly figure out who owned content?<p>What if I published a book, it was copy-pasted in blogs, and then later I put it somewhere crawlable by Google? You certainly can't just say "first time we saw it, that's the proper owner". It would either require a massive amount of manual QA to get right (and even then, there are going to be interminable copyright battles), or have a super high error rate.<p>I think Google's best value is letting proper content owners easily find violators via normal searches, and let them deal with them via takedown notices or the court system -- which is where it should be done, not in a pseudo-court run by a Google who does not want what responsibility.
I was huge into SEO for a few years. I try to stay out of it now, but it's worth noting that this is almost certainly due to the current algorithm's obsession with "freshness." The weaker site is ranking higher with the stolen content because their site was updated more recently. Steal some back and I bet they swap ranks again.<p>Also, the combination of the pagerank algorithm and normal user behavior typically helps Google to understand who was first and who deserves to rank higher. That is, most people don't plagiarize content, they quote it and then cite the source, which (thanks to pagerank) tends to rank the original better than sites which have plagiarized it.
This came up before on YC.[1] Google does have a system to detect provenance, but you have to report your changes to Google as an RSS feed.[2] Google hasn't updated that page since 2010, and it may no longer do anything.<p>[1] <a href="https://news.ycombinator.com/item?id=10103545" rel="nofollow">https://news.ycombinator.com/item?id=10103545</a>
[2] <a href="https://pubsubhubbub.appspot.com/" rel="nofollow">https://pubsubhubbub.appspot.com/</a>
What a rubbish article.. It doesn't fly for a second under copyright law. Google is entirely within their rights doing what they're doing. The onus isn't on Google to detect the infringing content.<p>For anyone interested in copyright and legal issues, I'd recommend checking out techdirt.com. They have a great starter section at <a href="https://www.techdirt.com/blog/?tag=techdirt+feature" rel="nofollow">https://www.techdirt.com/blog/?tag=techdirt+feature</a>, and they cover legal, copyright, patent, surveillance and all sorts of related topics. High quality journalism.
Is this any better/worse than Facebook actively trying to profit and win over users when people or organizations copy / upload / soak up views for material they did not create and don't have the rights to use? Because that's a hot-point of discussion in some creative circles as well.
Recognizing "stolen" content autonomously can only pivot on knowing when something was first published or visible to Google, which is a pretty dubious measurement.
This is not stealing, and even if it is illegal that is a bad way to put it. Also, google's service is primarily to the searcher so this isn't a huge issue for them.