Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Using mass-check To Test Rules

Wiki Markup"mass-check" is a tool included in the \[wiki:MassesOverview 'masses' directory\], which can be found in the \[wiki:DownloadFromSvn SVN repository\], to test rules for accuracy and hit-rate. If you're writing custom rules, you really should use this to test them.

First, you need HandClassifiedCorpora. Let's say that's made up of two mbox folders, "/path/to/ham" and "/path/to/spam".

...

Mass-check reads a "user_prefs" file in "spamassassin/user_prefs". You need to create this yourself, it will not be created for you.

To test your own rules, you'll need to put them in this file, and include a line containing "allow_user_rules 1"

Using network tests

...

For mass-checks for scoresets 1 or 3, using network tests, you need to provide the {{\-\-net}} switch. Ensure Net::DNS, Mail::SPF, Mail::Query, Razor DKIM (at least 0.31, preferably 0.36_5 or later), Razor (InstallingRazor), Pyzor (InstallingPyzor) and DCC (\["InstallingDCC"\]) are installed.

Network tests are slow unless you use the -j switch to allow mass-check to start multiple parallel scanning processes.

...

No Format
    cd masses
    mkdir spamassassin
    rm spamassassin/bayes*
    echo "use_bayes 1" >>> spamassassin/user_prefs

or to turn it off:

No Format
    cd masses
    mkdir spamassassin
    echo "use_bayes 0" >>> spamassassin/user_prefs

Once mass-check completes


The If you're using mass-check to test your own rules, the next step is to run hit-frequencies: see HitFrequencies for details. Alternatively, if you're submitting data for a new scoreset, see RescoreMassCheck, or NightlyMassCheck for the nightly QA test.


Usage


Wiki Markup
mass-check \[options\] target ...


-c=file

set configuration/rules directory

BR

-p=dir

set user-prefs directory

BR

-f=file

read list of targets from <file>

BR

-j=jobs

specify the number of processes to run simultaneously

BR

--net

turn on network checks!

BR

--mid

report Message-ID from each message

BR

--debug

report debugging information

BR

--progress

show progress updates during check

BR

--rewrite=OUT

save rewritten message to OUT (default is /tmp/out)

BR

--showdots

print a dot for each scanned message

BR

--rules=RE

Only test rules matching the given regexp RE

BR

--restart=N

restart all of the children after processing N messages

BR

--deencap=RE

Extract SpamAssassin-encapsulated spam mails only if they were encapsulated by servers matching the regexp RE (default = extract all SpamAssassin-encapsulated mails)

log optionsBR

-o

write all logs to stdout

BR

--loghits

log the text hit for patterns (useful for debugging)

BR

--loguris

log the URIs found

BR

--hamlog=log

use <log> as ham log ('ham.log' is default)

BR

--spamlog=log

use <log> as spam log ('spam.log' is default)

BR

message selection optionsBR

-n

no date sorting or spam/ham interleaving

BR

--after=N

only test mails received after time_t N (negative values are an offset from current time, e.g. -86400 = last day) or after date as parsed by Time::ParseDate (e.g. '-6 months')

BR

--before=N

same as --after, except received times are before time_t N

BR

--cache

Use cached information about atime (generates files in corpus area)

BR

--all

don't skip big messages

BR

--head=N

only check first N ham and N spam (N messages if -n used)

BR

--tail=N

only check last N ham and N spam (N messages if -n used)

BR

simple target options (implies -o and no ham/spam classification) BR

--dir

subsequent targets are directories

BR

--file

subsequent targets are files in RFC 822 format

BR

--mbox

subsequent targets are mbox files

BR

--mbx

subsequent targets are mbx files

BR

Just left over functions we should remove at some point: BR

--bayes

report score from Bayesian classifier

BR

Usage: Targets

non-option arguments are used as target names (mail files and folders), the target format is: <class>:<format>:<location> BR

class

is "spam" or "ham"

BR

format

is "detect", "dir", "file", "mbx", or "mbox"

, or "detect" BR

location

is a file or directory name. Globbing of ~ and * is supported.

BR

"detect" can be used as a formatis the easiest format to use. This assumes "mbox" for any file whose path contains the pattern "/\.mbox/i", "filedirectory" for anything that is not a directory, or "directoryfile" otherwise.CategorySoftware