Nightly Mass-Check Runs
What?
Nightly MassCheck runs are currently the primary vehicle for evaluating the quality of rules checked into SpamAssassin. Every night contributors check out a specific revision of SpamAssassin from SVN and run MassCheck on their corpora. They upload their MassCheck logs to an rsync server, where the RuleQaApp analyses the logs.
(There's also an older, clunkier version of the analysis scripts running on DanielQuinlan's server; see http://www.pathname.com/~corpus .)
How?
The corpus-nightly script in the masses/rule-qa/ directory of the SpamAssassin tree can be used to set this up. It's probably not very well documented, (WeLoveVolunteers), but it does work.
You'll also need to ask for RsyncAccounts and make sure you get a "nightly" account rather than a release-time account.
How? (in more detail)
Get ahold of http://rsync.spamassassin.org/$VERS-versions.txt, where
$VERS is either "nightly" or "weekly". "nightly" is updated a little before 0900 UTC Sunday through Friday. "weekly" is updated at the same time on Saturdays, and is meant to be a net-enabled run. ie: wait until at least 0900 UTC before trying to do a corpus run. The above files are also available via the standard rsync system.
Get a "nightly" rsync account (see 'How?' above).
The format of the above files is a file of "date <tab> revision <LF>", date in YYYY-MM-DD format, revision being the value that comes out of SVN. New lines are added to the bottom of the file.
So... Grab the file, find the right line (you can either grep for the date, or just take the last line of the file), and use the second column to update your corpora version. ie:
REV=`tail -1 nightly.txt | awk '{print $2}'` cd /path/to/spamassassin-checkout svn update -r $REV
Alternatively, if you don't have Subversion set up, and would prefer to pick it up via rsync:
rsync -vrz --delete \ rsync://rsync.spamassassin.org/tagged_builds/nightly_mass_check .
(replace "nightly" with "weekly" for the weekly builds.)
Then use that build of SpamAssassin to perform a MassCheck , and when that completes, upload the results as per the instructions in http://spamassassin.org/dist/masses/CORPUS_SUBMIT_NIGHTLY .
Note: The result log-files must have an SVN revision line in the output, like so:
# mass-check results from jm@jalapeno, on Mon Nov 21 09:10:15 UTC 2005 # M:SA version 3.2.0-r322462 # SVN revision: 345462 # Perl version: 5.008003 on i386-linux-thread-multi # Switches: '--progress --tail=20000 -j 4 -f /home/jm/cor/tgts'
If that line isn't present, the rule-QA reporting system cannot correlate the logs with the source revision, and instead ignores them.
If you do not use SVN to retrieve the SpamAssassin source tree, this may not be present, since "mass-check" cannot use "svn info" to get the current revision data. However, there's a workaround. Before running "mass-check", run "svn info" and redirect the output into a file called "svninfo.tmp" in the "masses" directory. Mass-check will read that and use its data for the "SVN revision:" line.
(The version of the tree available at rsync://rsync.spamassassin.org/tagged_builds/nightly_mass_check and .../weekly_mass_check already has this file included.)
An Easier Way
There is one; if you rsync up your corpus to the buildbot server, as described in UploadedCorpora, it can be mass-checked there instead. Unfortunately you have to share your mail corpus with whoever might have access to that machine. (It's not expected that anyone will actually look, but if you are very concerned about privacy, you may be advised to strip out the more private mails before uploading, or mass-check on your own machine instead.)
Logs from the nightly mass-checks are visible at http://buildbot.spamassassin.org/bbmass/ .
(Administrivia: setting up a nightly mass-check user on buildbot.spamassassin.org)
For PMC members who want to set up a user for this; log in to the zone and run:
MCUSER=[username] MCPWD=[random password] sudo mkdir /home/bbmass/mc-nightly/$MCUSER sudo chmod 1777 /home/bbmass/mc-nightly/$MCUSER cd /home/bbmass/mc-nightly/$MCUSER sed -e "s/MCUSER/$MCUSER/" -e "s/MCPWD/$MCPWD/" > .corpus
And paste in these lines:
opts_weekly="--restart=500 --tail=15000 --net -j 8 -f /home/bbmass/mc-nightly/targets.MCUSER" opts_nightly="--restart=500 --tail=15000 -f /home/bbmass/mc-nightly/targets.MCUSER" tmp=$HOME/tmp tree=$HOME/svn prefs_weekly=$HOME/user_prefs.weekly prefs_nightly=$HOME/user_prefs.nightly username=bb-MCUSER password=MCPWD
Then CTRL-D to end cat.
mkdir tmp svn co http://svn.apache.org/repos/asf/spamassassin/trunk svn [accept certificate 'p'ermanently] sudo chown -R bbmass .
In SVN trunk, edit build/automc/run_nightly
, add their username to the list, check that file in.
Then in the zone, as the uid "automc", do this:
cd /home/automc/svn/spamassassin svn up
so that that latest script is updated for when cron runs.
Finally, edit /home/corpus-rsync/secrets
and add a line to the end, like so:
$MCUSER:$MCPWD
e.g. if MCUSER was "bb-jm" and the generated MCPWD was "Wi0FdPWg":
bb-jm:Wi0FdPWg