You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Using mass-check To Test Rules

"mass-check" is a tool included with the SpamAssassin source distribution to test rules for accuracy and hit-rate. If you're writing custom rules, you really should use this to test them.

First, you need HandClassifiedCorpora. Let's say that's divided into two maildir folders, "/path/to/ham" and "/path/to/spam".

Next, cd into the "masses" directory of the source distribution:

    cd masses
    ./mass-check --progress \
              ham:dir:/path/to/ham \
              spam:dir:/path/to/spam

This will create two files, "ham.log" and "spam.log" containing hit-rates from the rules in the rules dir "../rules" as they are applied to that corpus.

mass-check also takes other options to control whether network tests are run, whether multiple processes are run in parallel, how the output is presented, etc.; read the comments at the top of the file for details. Here's some key bits:

Using network tests

for mass-checks for scoresets 1 or 3, using network tests, you need to provide the --net switch. Ensure Net::DNS, Mail::SPF::Query, Razor, pyzor and DCC are installed.

Network tests are slow unless you use the -j switch to allow mass-check to start multiple parallel scanning processes.

Using Bayes

This is controlled using the mass-check configuration file. Do this:

    cd masses
    mkdir spamassassin
    rm spamassassin/bayes*
    echo "use_bayes 1" > spamassassin/user_prefs

Once mass-check completes

The next step is to run hit-frequencies: see HitFrequencies for details.

Further discussion concerning the use of mass-check can be found at the other SA wiki at http://www.exit0.us/index.php/Against%20a%20Corpus


  • No labels