Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason] reorder to emphasise easy way; update for new "nitemc" uid

...

Nightly MassCheck runs are currently the primary vehicle for evaluating the quality of rules checked into SpamAssassin. Every night contributors check out a specific revision of SpamAssassin from SVN and run MassCheck on their corpora. They upload their MassCheck logs to an rsync server, where lots of analysis takes place, visible through the RuleQaApp analyses the logs.

(There's also an older, clunkier version of the analysis scripts running on DanielQuinlan's server; see http://www.pathname.com/~corpus .)

There are three ways to do this; using a script we distribute, doing it yourself, or just uploading your corpus to our server.

How? (The Easiest Way)

If you rsync up your corpus to our server, as described in UploadedCorpora, it can be mass-checked there. Unfortunately you have to share your mail corpus with whoever might have access to that machine. (It's not expected that anyone will ever actually look, but it's there nonetheless. If you are very concerned about privacy, you may be advised to strip out the more private mails before uploading, or mass-check on your own machine instead. This is what I do --jm)

Details for PMC members on how to set up new accounts for this are below, under '(Administrivia: setting up a nightly mass-check user on spamassassin.zones.apache.org)'.

How? (Less Easy, The Corpus-Nightly Script)

The corpus-nightly script in the masses/rule-qa/ directory of the SpamAssassin tree can be used to set up a mass-checker on your mail. Here's a step-by-step account of the process.

...

Note: the best time to run a mass-check is as soon as possible after 0900 UTC. Daylight savings time in some local timezones can be troublesome, so the script will adjust for this by sleeping for an hour if it detects that it was started in the 0800 UTC hour period, so you no longer have to worry about that.

How? (For Hackers, The DIY Version)

Here's more detail on that process, if you don't want to use the "corpus-nightly" script.

...

No Format
REV=`tail -1 nightly.txt | awk '{print $2}'`
cd /path/to/spamassassin-checkout
svn update -r $REV

Alternatively, if you don't have Subversion set up, and would prefer to pick it up via rsync:

...

(The version of the tree available at rsync://rsync.spamassassin.org/tagged_builds/nightly_mass_check and .../weekly_mass_check already has this file included.)

...

)

...

There is one; if you rsync up your corpus to the buildbot server, as described in UploadedCorpora, it can be mass-checked there instead. Unfortunately you have to share your mail corpus with whoever might have access to that machine. (It's not expected that anyone will actually look, but if you are very concerned about privacy, you may be advised to strip out the more private mails before uploading, or mass-check on your own machine instead.)

Logs from the nightly mass-checks are visible at http://buildbot.spamassassin.org/bbmass/ .

(Administrivia: setting up a nightly mass-check user on

...

spamassassin.zones.

...

apache.org)

For PMC members who want to set up a user for this; log the "Easiest" method. Log in to the zone and run:

No Format
MCUSER=[username]
MCPWD=[random password]

sudo mkdir /export/home/bbmass/mc-nightlynitemc/$MCUSER
sudo chmod 1777 /export/home/bbmass/mc-nightlynitemc/$MCUSER
cd /export/home/bbmassnitemc/mc-nightly/$MCUSER
echo "$MCPWD" > rsync_password
chmod 600 rsync_password

sed -e "s/MCUSER/$MCUSER/" -e "s/MCPWD/$MCPWD/" > .corpus

And paste in these lines:

No Format
opts_weekly="--net -j 8 --reuse --restart=500cache --cachedir=/tmpfs/aicache_nightly --cs_schedule_cache --tail=15000 --net -j 8 -f /home/bbmass/mc-nightly/targets.MCUSERcs_cachedir=/export/home/nitemc/cache --restart=500 ham:detect:/export/home/bbmass/uploadedcorpora/MCUSER/ham/* --after="-15552000" --tail=25000 spam:detect:/export/home/bbmass/uploadedcorpora/MCUSER/spam/*"
opts_nightly=" --reuse --cache --cachedir=/tmpfs/aicache_nightly --cs_schedule_cache --cs_cachedir=/export/home/nitemc/cache --restart=500 ham:detect:/export/home/bbmass/uploadedcorpora/MCUSER/ham/* --tail=15000after="-15552000" -f /-tail=25000 spam:detect:/export/home/bbmass/mc-nightlyuploadedcorpora/targets.MCUSER/spam/*"
tmp=$HOME/tmp
tree=$HOME/svn
prefs_weekly=$HOME/user_prefs.weekly
prefs_nightly=$HOME/user_prefs.nightly
username=bb-MCUSER
password=MCPWD__RSYNC_PASSWORD__
serverhost=spamassassin.zones.apache.org.:38899
clienthosts=__CLIENTHOSTS__
clienttree=nightlymc_MCUSER

Then CTRL-D to end cat.

No Format
mkdir tmp
svn co http://svn.apache.org/repos/asf/spamassassin/trunk svn
[accept certificate 'p'ermanently]

sudo chown -R bbmassnitemc .

In SVN trunk, edit build/automc/run_nightly, add their username to the list, check that file in.

...