You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

How to Increase SpamAssassin Accuracy

Run a recent version

Regular updates of SpamAssassin 3.2.x rules stopped in 2008. Accuracy depends on more recent rules. Upgrade to 3.3.0 or newer.

Run sa-update daily

This is often included in SpamAssassin packaging, but sa-update should be run from cron daily, to get the latest SpamAssassin rules which are generated every day.

Enable network rules

This is the default, but disabling network rules (including DNS rules) causes SpamAssassin to be wrong on about 5 times more emails. Network tests may have been disabled by running spamassassin or spamd with the command line arguments -L or --local. DNS rules may have been disabled with "dns_available no" in local.cf. You should run a local caching DNS server for efficiency.

As of 2011-03-24, without network tests, SpamAssassin is wrong 5.35 times as often on non-spam, and 4.25 times as often on spam.

Install Pyzor and Razor

These are two helper applications with useful (network) rules. If they're installed correctly, the debug output of SpamAssassin will include:

Apr 14 16:24:37.315 [4709] dbg: plugin: loading Mail::SpamAssassin::Plugin::Pyzor from @INC
Apr 14 16:24:37.318 [4709] dbg: pyzor: network tests on, attempting Pyzor
Apr 14 16:24:37.318 [4709] dbg: plugin: loading Mail::SpamAssassin::Plugin::Razor2 from @INC
Apr 14 16:24:37.381 [4709] dbg: razor2: razor2 is available, version 2.84

Verify AWL and the Bayesian classifier aren't poisoned

The AutoWhitelist, and Bayesian classifier when automatically trained, can get trained incorrectly, resulting in scoring email wrong. Verify they are providing useful scores - positive scores for spam, and negative scores for ham (AWL and BAYES_* tests). They can be disabled with:

use_auto_whitelist 0
use_bayes 0

To only disable automatic training of the Bayesian classifier:

bayes_auto_learn 0

Remove any SARE rules

SARE rules have not been updated in years, and are therefore actively harmful.

Enable Sought rules

SoughtRules is a custom rule set generated from spam 4 times a day by a SpamAssassin developer.

Use sa-learn to manually train the Bayesian classifier

If it's worth the time to increase the accuracy of filtration of your own personal email, you can manual sort it into ham and spam folders, and then use sa-learn to train it. This can be used for a group effectively if the group is well trained (not to classify mailing lists they've subscribed to but lost interest in as spam).

Pick a useful threshold

The default threshold is 5, which is used to calculate the scores of all of the tests. Higher numbers will result in fewer emails considered spam - both reducing false positives, and increasing false-negatives. Reducing the threshold below 5 is not recommended. This is configured with:

required_score 5
  • No labels