I have a mail server for my personal email that I've been maintaining for the better part of a decade. It started out as Qmail/Courier, changed to Postfix/Courier and finally Postfix/Dovecot, always with Spamassassin as the spam filter.<p>For a while, it was configured to discard spam. Later, I configured it to put the spam in a special IMAP folder. I have the same system today, but on top of that, I have a background shell script that watches my mail folders and runs sa-learn on them. Anything that appears in the spam folder is learned as spam. Anything that appears elsewhere is learned as ham. Most of the time, this will be email that has already been classified correctly, but if I move an email, it will reclassify it.<p>Some time ago, it began leaking spam into the inbox. It turned out that I needed to make the Bayes scores more aggressive, and after I did that, it's been more or less perfect. More or less zero false positives and maybe a couple of false negatives per week.<p>I find that the good old Bayes classifier is still the most powerful tool in my spam filtering toolbox. You just have to be persistent and consistent in how you train it and tune it. For example, you shouldn't train it to classify legitimate newsletters as spam even if they are undesirable. Instead, you should unsubscribe and put those in the trash folder. I find that this considerably reduces the false positives.
I used to run Postfix + Dovecot + Rspamd with all the bells and whistles enabled [1], but I recently switched to OpenSMTPD + spamd on OpenBSD.<p>Unlike rspamd, which has pluggable modules for everything under the sun (RBLs, word filters, Bayesian filtering + learning), spamd uses plain ol' graylisting with some PF integration to throttle spammers connections to 1 character/second for maximum annoyance.<p>With Rspamd, I never got <i>any</i> spam in my Inbox. With spamd, I get maybe 1 spam mail every two weeks. To me, spamd's ridiculous simplicity is worth the tradeoff.<p>You do have to be careful with graylisting large mailers like Gmail, since they rarely retry the mail from the same IP address. For this, OpenBSD's smtpctl now has the spfwalk [2] command to whitelist the big guys. That's what I use in my current setup [3], which was linked here a few days ago.<p>[1] <a href="https://www.c0ffee.net/blog/mail-server-guide/" rel="nofollow">https://www.c0ffee.net/blog/mail-server-guide/</a><p>[2] <a href="https://poolp.org/posts/2018-01-08/spfwalk/" rel="nofollow">https://poolp.org/posts/2018-01-08/spfwalk/</a><p>[3] <a href="https://github.com/cullum/dank-selfhosted" rel="nofollow">https://github.com/cullum/dank-selfhosted</a>
> The current code is getting old, and there is interest in applying deep-learning techniques to the spam-detection problem.<p>Yes please! I archive all my mail, both desired and spam mail, with the intention of using this data to train a neural network that will be able to classify mail as spam, desired newsletter or desired personal mail.
> some sites are using it to detect spam submitted in web forms, for example.<p>I actually have the web server send the form contents via email because my email server runs SpamAssassin and it has done a great job catching form spam.
I let Google filter my spam these days, but back when I did it myself I liked popfile better because it could handle general classification versus just spam or not-spam.<p>So I could train it to move mail into a "High Priority" bucket, or "Not Spam, But Promotional" bucket, etc.<p>Curious if SpamAssassin can do that now.<p>I was also curious about the "freemail antiforge" feature mentioned in the article, but couldn't find much about it.
Spam detection on my personal email server took a huge leap forward when I got Spamassassin correctly configured to use the DNS blocklists (URIBL, etc.). I had to set up a local DNS server because the blocklists weren't responding to requests coming via my hosting provider's DNS, and the scores had to be tweaked over a few weeks, but now it's working great. The content rules like "hi my dear", "request for money", and "all caps" are still in there, but the blocklists do the heavy
lifting.
It can be exciting when your filter blocks a ton of spam, but then you get a few false positives, and of course they happened to be very important mails ... I think the spam filter needs to be plugged into the MTA, so that when a message is flagged as spam, the MTA can tell the sender that hey, this message got flagged as spam.
One thing I really liked about SpamAssassin and how prevalent it was was that I could set hashcash on my outbound e-mails and basically eliminate the chances of my e-mail being caught as spam. This was probably a decade ago (I've since moved to gmail), but it would take 10-60 seconds per recipient to generate the hashcash, and SpamAssassin would give those messages a large positive weight.<p>I'm kind of surprised that it never caught on.
SA is good for small orgs. For my own server, I just use regex rules in postfix called S25R [1]. It has been sufficient for me and keeps the run queue very low.<p>I implemented S25R regex rules in a medium sized company that had a very bad spam problem due to old outlook setups replying to all the spammers with OOO replies. They were using SA and tried to keep up on rule tuning, but it was a losing battle. The OS run queue on the 6 inbound servers averaged 6 and would sometimes peak at 14+. I switched to S25R regex rules and the run queue dropped to 0.2. I did reject some "valid" emails from folks that left the company and were running their own business from their home cable modems, but eventually whitelisted some of them. The employees were very happy with the change. They went from receiving 2k+ spam msgs per day (each) to less than a dozen.<p>[1] - <a href="http://www.gabacho-net.jp/en/anti-spam/anti-spam-system.html" rel="nofollow">http://www.gabacho-net.jp/en/anti-spam/anti-spam-system.html</a>
Hmm. Maybe it's time to give SA a try again.<p>I had very little luck in keeping SA performing well over time. It would go for a few months doing a decent job, then just seem to fall off the tracks for a new particular reign of spam, even with graylisting on.<p>I finally gave up and have been using... A Barracuda box. I hate to say it (because they're stupid expensive for all but the smallest box), but it has worked incredibly well, with zero overhead. I've gone for over a year without so much as logging in to the box.<p>Ultimately, I think the 'cuda is just running SA with Barracuda's collective filtering added on, but it sure works for me and my dozen or so users.<p>Maybe Barracuda doesn't sell that many spam boxes (they have quite a few other products), but I can say that it has sure worked for me.<p>If I could keep SA tuned well with minimal effort, it would be worth the savings however.
Not directly related to SA being back, but since spam combat stories and approaches are shared here: I used to run SA with postfix, but it was quite a memory hog, occasionally leading to OOM killer raging. Half a year ago I've disabled it and revised regular postfix (+ postscreen) settings, consulting a couple of helpful articles [1,2], and didn't get any "real" spam (i.e., not counting Bitbucket) since.<p>Likely the incoming spam varies for different people and their mailboxes, but tweaking those standard settings can be surprisingly efficient.<p>[1] <a href="http://rob0.nodns4.us/postscreen.html" rel="nofollow">http://rob0.nodns4.us/postscreen.html</a><p>[2] <a href="http://jimsun.linxnet.com/misc/postfix-anti-UCE.txt" rel="nofollow">http://jimsun.linxnet.com/misc/postfix-anti-UCE.txt</a>
A big shout out to Julian Field et. al at MailScanner - I don't manage any mail servers thesedays, but going back around 15 years, it was at the heart of several setups.<p><a href="https://www.mailscanner.info/" rel="nofollow">https://www.mailscanner.info/</a><p>And, yes, it uses Spamassassin.
Between SpamAssassin, amavis, and postscreen (mainly postscreen) I rarely see any spam anymore.<p>but I do see that development has been kind of slow, and not just with SA - most everything email related.
It truly frightens me just how many people gave up and handed all of their email over to Google and Gmail, who continue to prefer their way of doing things.<p>i'm looking at rebuilding a mail server on a new VPS pretty soon, and I'll happily set up spamassassin once again. it's great to see that it's still going strong.
> <i>Just like Gmail, SpamAssassin isn't the perfect filter for everybody right out of the box; it's really a framework that can be used to create that filter.</i><p>I find gmail to be about 99% accurate “out of the box”. With today’s technology and processing power I don’t see why that can’t also be true for other spam offerings.
When I was involved with the mechanics of an ESP delivering bulk mail for clients, I found SpamAssassin to be very useful for scanning e-mails even before they get sent in order to detect possible abuse.
Spamassassin is and has been perfectly adequate as long as you train your filter regularly with new spam.<p>And the more users you filter for, the more spam you should be feeding your filter, and regularly.<p>When the person says their gmail account is full of spam I take that with a grain of salt because I've been using gmail since it started, when it was like 5 invites only, and I use it as my throwaway account everywhere. I know for a fact that gmail does an excellent job at filtering spam.