SpamAssassin is back

324 pointsby l2dyover 6 years ago

21 comments

ThJover 6 years ago

I have a mail server for my personal email that I've been maintaining for the better part of a decade. It started out as Qmail/Courier, changed to Postfix/Courier and finally Postfix/Dovecot, always with Spamassassin as the spam filter.For a while, it was configured to discard spam. Later, I configured it to put the spam in a special IMAP folder. I have the same system today, but on top of that, I have a background shell script that watches my mail folders and runs sa-learn on them. Anything that appears in the spam folder is learned as spam. Anything that appears elsewhere is learned as ham. Most of the time, this will be email that has already been classified correctly, but if I move an email, it will reclassify it.Some time ago, it began leaking spam into the inbox. It turned out that I needed to make the Bayes scores more aggressive, and after I did that, it's been more or less perfect. More or less zero false positives and maybe a couple of false negatives per week.I find that the good old Bayes classifier is still the most powerful tool in my spam filtering toolbox. You just have to be persistent and consistent in how you train it and tune it. For example, you shouldn't train it to classify legitimate newsletters as spam even if they are undesirable. Instead, you should unsubscribe and put those in the trash folder. I find that this considerably reduces the false positives.

评论 #18460180 未加载

评论 #18459109 未加载

评论 #18464025 未加载

评论 #18459944 未加载

评论 #18458894 未加载

评论 #18459471 未加载

评论 #18463134 未加载

评论 #18458891 未加载

评论 #18458898 未加载

评论 #18458754 未加载

评论 #18498264 未加载

评论 #18462173 未加载

评论 #18460422 未加载

评论 #18458767 未加载

perlgodover 6 years ago

I used to run Postfix + Dovecot + Rspamd with all the bells and whistles enabled [1], but I recently switched to OpenSMTPD + spamd on OpenBSD.Unlike rspamd, which has pluggable modules for everything under the sun (RBLs, word filters, Bayesian filtering + learning), spamd uses plain ol' graylisting with some PF integration to throttle spammers connections to 1 character/second for maximum annoyance.With Rspamd, I never got any spam in my Inbox. With spamd, I get maybe 1 spam mail every two weeks. To me, spamd's ridiculous simplicity is worth the tradeoff.You do have to be careful with graylisting large mailers like Gmail, since they rarely retry the mail from the same IP address. For this, OpenBSD's smtpctl now has the spfwalk [2] command to whitelist the big guys. That's what I use in my current setup [3], which was linked here a few days ago.[1] <a href="https://www.c0ffee.net/blog/mail-server-guide/" rel="nofollow">https://www.c0ffee.net/blog/mail-server-guide/</a>[2] <a href="https://poolp.org/posts/2018-01-08/spfwalk/" rel="nofollow">https://poolp.org/posts/2018-01-08/spfwalk/</a>[3] <a href="https://github.com/cullum/dank-selfhosted" rel="nofollow">https://github.com/cullum/dank-selfhosted</a>

评论 #18459901 未加载

codetrotterover 6 years ago

> The current code is getting old, and there is interest in applying deep-learning techniques to the spam-detection problem.Yes please! I archive all my mail, both desired and spam mail, with the intention of using this data to train a neural network that will be able to classify mail as spam, desired newsletter or desired personal mail.

decasteveover 6 years ago

> some sites are using it to detect spam submitted in web forms, for example.I actually have the web server send the form contents via email because my email server runs SpamAssassin and it has done a great job catching form spam.

tyingqover 6 years ago

I let Google filter my spam these days, but back when I did it myself I liked popfile better because it could handle general classification versus just spam or not-spam.So I could train it to move mail into a "High Priority" bucket, or "Not Spam, But Promotional" bucket, etc.Curious if SpamAssassin can do that now.I was also curious about the "freemail antiforge" feature mentioned in the article, but couldn't find much about it.

评论 #18458923 未加载

评论 #18464179 未加载

breakallover 6 years ago

Spam detection on my personal email server took a huge leap forward when I got Spamassassin correctly configured to use the DNS blocklists (URIBL, etc.). I had to set up a local DNS server because the blocklists weren't responding to requests coming via my hosting provider's DNS, and the scores had to be tweaked over a few weeks, but now it's working great. The content rules like "hi my dear", "request for money", and "all caps" are still in there, but the blocklists do the heavy lifting.

z3t4over 6 years ago

It can be exciting when your filter blocks a ton of spam, but then you get a few false positives, and of course they happened to be very important mails ... I think the spam filter needs to be plugged into the MTA, so that when a message is flagged as spam, the MTA can tell the sender that hey, this message got flagged as spam.

评论 #18458591 未加载

评论 #18458683 未加载

linsomniacover 6 years ago

One thing I really liked about SpamAssassin and how prevalent it was was that I could set hashcash on my outbound e-mails and basically eliminate the chances of my e-mail being caught as spam. This was probably a decade ago (I've since moved to gmail), but it would take 10-60 seconds per recipient to generate the hashcash, and SpamAssassin would give those messages a large positive weight.I'm kind of surprised that it never caught on.

评论 #18460261 未加载

LinuxBenderover 6 years ago

SA is good for small orgs. For my own server, I just use regex rules in postfix called S25R [1]. It has been sufficient for me and keeps the run queue very low.I implemented S25R regex rules in a medium sized company that had a very bad spam problem due to old outlook setups replying to all the spammers with OOO replies. They were using SA and tried to keep up on rule tuning, but it was a losing battle. The OS run queue on the 6 inbound servers averaged 6 and would sometimes peak at 14+. I switched to S25R regex rules and the run queue dropped to 0.2. I did reject some "valid" emails from folks that left the company and were running their own business from their home cable modems, but eventually whitelisted some of them. The employees were very happy with the change. They went from receiving 2k+ spam msgs per day (each) to less than a dozen.[1] - <a href="http://www.gabacho-net.jp/en/anti-spam/anti-spam-system.html" rel="nofollow">http://www.gabacho-net.jp/en/anti-spam/anti-spam-system.html</a>

aduitsisover 6 years ago

Extremely glad to see a large project written in Perl pick up steam again.

评论 #18464270 未加载

creebleover 6 years ago

Hmm. Maybe it's time to give SA a try again.I had very little luck in keeping SA performing well over time. It would go for a few months doing a decent job, then just seem to fall off the tracks for a new particular reign of spam, even with graylisting on.I finally gave up and have been using... A Barracuda box. I hate to say it (because they're stupid expensive for all but the smallest box), but it has worked incredibly well, with zero overhead. I've gone for over a year without so much as logging in to the box.Ultimately, I think the 'cuda is just running SA with Barracuda's collective filtering added on, but it sure works for me and my dozen or so users.Maybe Barracuda doesn't sell that many spam boxes (they have quite a few other products), but I can say that it has sure worked for me.If I could keep SA tuned well with minimal effort, it would be worth the savings however.

评论 #18460279 未加载

defanorover 6 years ago

Not directly related to SA being back, but since spam combat stories and approaches are shared here: I used to run SA with postfix, but it was quite a memory hog, occasionally leading to OOM killer raging. Half a year ago I've disabled it and revised regular postfix (+ postscreen) settings, consulting a couple of helpful articles [1,2], and didn't get any "real" spam (i.e., not counting Bitbucket) since.Likely the incoming spam varies for different people and their mailboxes, but tweaking those standard settings can be surprisingly efficient.[1] <a href="http://rob0.nodns4.us/postscreen.html" rel="nofollow">http://rob0.nodns4.us/postscreen.html</a>[2] <a href="http://jimsun.linxnet.com/misc/postfix-anti-UCE.txt" rel="nofollow">http://jimsun.linxnet.com/misc/postfix-anti-UCE.txt</a>

linker3000over 6 years ago

A big shout out to Julian Field et. al at MailScanner - I don't manage any mail servers thesedays, but going back around 15 years, it was at the heart of several setups.<a href="https://www.mailscanner.info/" rel="nofollow">https://www.mailscanner.info/</a>And, yes, it uses Spamassassin.

jrnicholsover 6 years ago

Between SpamAssassin, amavis, and postscreen (mainly postscreen) I rarely see any spam anymore.but I do see that development has been kind of slow, and not just with SA - most everything email related. It truly frightens me just how many people gave up and handed all of their email over to Google and Gmail, who continue to prefer their way of doing things.i'm looking at rebuilding a mail server on a new VPS pretty soon, and I'll happily set up spamassassin once again. it's great to see that it's still going strong.

leowinterdeover 6 years ago

Running rspamd on some servers which filters fine and the dmarc reports are an awesome feature. A comparison would be interesting in the future.

评论 #18459077 未加载

dev_dullover 6 years ago

> Just like Gmail, SpamAssassin isn't the perfect filter for everybody right out of the box; it's really a framework that can be used to create that filter.I find gmail to be about 99% accurate “out of the box”. With today’s technology and processing power I don’t see why that can’t also be true for other spam offerings.

kokeyover 6 years ago

When I was involved with the mechanics of an ESP delivering bulk mail for clients, I found SpamAssassin to be very useful for scanning e-mails even before they get sent in order to detect possible abuse.

INTPenisover 6 years ago

Spamassassin is and has been perfectly adequate as long as you train your filter regularly with new spam.And the more users you filter for, the more spam you should be feeding your filter, and regularly.When the person says their gmail account is full of spam I take that with a grain of salt because I've been using gmail since it started, when it was like 5 invites only, and I use it as my throwaway account everywhere. I know for a fact that gmail does an excellent job at filtering spam.

评论 #18459352 未加载

stcredzeroover 6 years ago

I'm picturing a new mascot. Blocky in the way Spongebob is, but ninja themed and made out of canned meat product.

krupanover 6 years ago

Wouldn't it be awesome if you could filter your facebook/twitter/whatever feed through spamassasin too?

appleflaxenover 6 years ago

I miss blue frog.