Wow - I think I underestimated the complexity of what Google is working on here. I had always assumed it was a pretty normal (perhaps distributed) well trained Bayesian algorithm. The article does a good job of highlighting some of the other issues they are facing. Makes me more tolerant of the few spam that do get through....
"the rarity with which users feel the need to check their spam box for false positives demonstrates a high precision of classification"<p>I find this a curious claim. How does people not checking their spam boxes demonstrate that there are, in fact, few false positives?
I have a simple technique gmail could use that would cut the email delivered to my inbox by over 90%. Observe that I don't correspond with anyone in Spanish, French, Russian, Italian, Chinese, or any other of a long list of languages, and put those message in the spam folder.
Spam tip:
Have your personal domain redirect all emails to gmail. Then when signing up on a website, enter "name_of_website.com@yourdomain.com" as your email. Helps to see which websites sell your email address to spammers. Also provides protection against websites stealing your email password.
Great $self->patOnBack() but web spam is a huge unsolved problem that hurts Google's core business and is possibly the biggest threat to their revenue and market share. It's also the biggest opportunity for innovators in the search space right now.