Using mass-check To Test Rules

Wiki Markup"mass-check" is a tool included in the \[wiki:MassesOverview 'masses' directory\], which can be found in the \[wiki:DownloadFromSvn SVN repository\], to test rules for accuracy and hit-rate. If you're writing custom rules, you really should use this to test them.

First, you need HandClassifiedCorpora. Let's say that's made up of two mbox folders, "/path/to/ham" and "/path/to/spam".

...

Mass-check reads a "user_prefs" file in "spamassassin/user_prefs". You need to create this yourself, it will not be created for you.

Using network tests

...

For mass-checks for scoresets 1 or 3, using network tests, you need to provide the {{\`-`\`-net`}} switch. Ensure Net::DNS, Mail::SPF, Mail::DKIM (at least 0.31, preferrably 0.36_5 or later), Razor (InstallingRazor), Pyzor (InstallingPyzor) and DCC (\["InstallingDCC"\]) are installed.
Network tests are slow unless you use the -j switch to allow mass-check to start multiple parallel scanning processes.
...
-c=file
set configuration/rules directoryBR
-p=dir
set user-prefs directoryBR
-f=file
read list of targets from <file>BR
-j=jobs
specify the number of processes to run simultaneouslyBR
--net
turn on network checks!BR
--mid
report Message-ID from each messageBR
--debug
report debugging informationBR
--progress
show progress updates during checkBR
--rewrite=OUT
save rewritten message to OUT (default is /tmp/out)BR
--showdots
print a dot for each scanned messageBR
--rules=RE
Only test rules matching the given regexp REBR
--restart=N
restart all of the children after processing N messagesBR
--deencap=RE
Extract SpamAssassin-encapsulated spam mails only if they were encapsulated by servers matching the regexp RE (default = extract all SpamAssassin-encapsulated mails)
log optionsBR
-o
write all logs to stdoutBR
--loghits
log the text hit for patterns (useful for debugging)BR
--loguris
log the URIs foundBR
--hamlog=log
use <log> as ham log ('ham.log' is default)BR
--spamlog=log
use <log> as spam log ('spam.log' is default)BR
message selection optionsBR
-n
no date sorting or spam/ham interleavingBR
--after=N
only test mails received after time_t N (negative values are an offset from current time, e.g. -86400 = last day) or after date as parsed by Time::ParseDate (e.g. '-6 months') BR
--before=N
same as --after, except received times are before time_t N BR
--cache
Use cached information about atime (generates files in corpus area)BR
--all
don't skip big messages BR
--head=N
only check first N ham and N spam (N messages if -n used) BR
--tail=N
only check last N ham and N spam (N messages if -n used) BR
simple target options (implies -o and no ham/spam classification) BR
--dir
subsequent targets are directories BR
--file
subsequent targets are files in RFC 822 format BR
--mbox
subsequent targets are mbox files BR
--mbx
subsequent targets are mbx files BR
Just left over functions we should remove at some point: BR
--bayes
report score from Bayesian classifier BR

Usage: Targets

non-option arguments are used as target names (mail files and folders), the target format is: <class>:<format>:<location> BR

class	is "spam" or "ham" BR
format	is "detect", "dir", "file", "mbx", or "mbox" BR
location	is a file or directory name. Globbing of ~ and * is supported. BR

"detect" is the easiest format to use. This assumes "mbox" for any file whose path contains the pattern "/\.mbox/i", "directory" for anything that is a directory, or "file" otherwise.

...

Child pages

Versions Compared

Old Version 18

New Version 19

Key

Using mass-check To Test Rules

Using network tests

Usage: Targets

-c=file	set configuration/rules directoryBR
-p=dir	set user-prefs directoryBR
-f=file	read list of targets from <file>BR
-j=jobs	specify the number of processes to run simultaneouslyBR
--net	turn on network checks!BR
--mid	report Message-ID from each messageBR
--debug	report debugging informationBR
--progress	show progress updates during checkBR
--rewrite=OUT	save rewritten message to OUT (default is /tmp/out)BR
--showdots	print a dot for each scanned messageBR
--rules=RE	Only test rules matching the given regexp REBR
--restart=N	restart all of the children after processing N messagesBR
--deencap=RE	Extract SpamAssassin-encapsulated spam mails only if they were encapsulated by servers matching the regexp RE (default = extract all SpamAssassin-encapsulated mails)

-o	write all logs to stdoutBR
--loghits	log the text hit for patterns (useful for debugging)BR
--loguris	log the URIs foundBR
--hamlog=log	use <log> as ham log ('ham.log' is default)BR
--spamlog=log	use <log> as spam log ('spam.log' is default)BR

-n	no date sorting or spam/ham interleavingBR
--after=N	only test mails received after time_t N (negative values are an offset from current time, e.g. -86400 = last day) or after date as parsed by Time::ParseDate (e.g. '-6 months') BR
--before=N	same as --after, except received times are before time_t N BR
--cache	Use cached information about atime (generates files in corpus area)BR
--all	don't skip big messages BR
--head=N	only check first N ham and N spam (N messages if -n used) BR
--tail=N	only check last N ham and N spam (N messages if -n used) BR

--dir	subsequent targets are directories BR
--file	subsequent targets are files in RFC 822 format BR
--mbox	subsequent targets are mbox files BR
--mbx	subsequent targets are mbx files BR

Child pages

Page History

Versions Compared

Old Version 18

New Version 19

Key

Using mass-check To Test Rules

Using network tests

Usage: Targets