Nightly Mass-Check Runs
What?
Nightly MassCheck runs are the way people submit data on the effectiveness of current rules on their recent spam and ham. It is used to generate the very rule scores that determine the effectiveness of SpamAssassin (distributed via sa-update), and to evaluate rules via the RuleQaApp. The accuracy of SpamAssassin is directly related to the number of people contributing to nightly MassChecks.
Usually a script is run from cron which automatically downloads the latest development version of SpamAssassin, runs it against your spam and ham, and then uploads a log of the results. One line per email, with a list of the SpamAssassin rules each email hit.
Currently, sa-updates are not even happening because we are not getting enough mass-check data submitted (since December 2010).
How?
- Send an email to private@spamassassin.apache.org requesting an rsync account for nightly mass-checks.
- Ensure SpamAssassin and its plugins are fully installed.
- Download the auto-mass-check script (browse repo):
git clone git://git.fedorahosted.org/auto-mass-check.git
- Copy
auto-mass-check/auto-mass-check.sh
to~/bin/
- Copy
auto-mass-check/auto-mass-check.cf
to~/.auto-mass-check.cf
- Modify
~/.auto-mass-check.cf
to point at your ham and spam folders. Be sure to configure properly for mbox (mbox) or Maildir (dir) folder formats. Leave the RSYNC options unchanged for now, because you will be running auto-mass-check in test mode at first. - Optionally set TRUSTED_NETWORKS and INTERNAL_NETWORKS in ~/.auto-mass-check.cf
- Run
auto-mass-check.sh
.- Look in
~/masscheckwork/nightly_mass_check/
forham-*.log
andspam-*.log
files. (Or weekly_mass_check on Saturday.) - Are the filenames good? They should be named something like
ham-username.log
orham-net-username.log
. - Read CorpusCleaning and HandClassifiedCorpora for guidelines of how to identify ham in your spam folder, and spam in your ham folder, and which messages you should be simply deleted.
- If you move/delete messages, do not forget to "Compact Folder" to be sure they are actually gone.
- Repeat auto-mass-check until you are certain both folders are cleaned.
- Look in
- Edit
~/.auto-mass-check.cf
and set RSYNC_USERNAME and RSYNC_PASSWORD with values from step 1. - Run
auto-mass-check.sh
, which will upload your results. - Ask a more experienced participant (probably the person who recruited you) to check your results on the server. They can see the uploaded log files by running a command like
rsync --old-d username@rsync.spamassassin.org::corpus/
- If your upload looks good, then you're probably ready to automate nightly checks. Configure auto-mass-check to run as a cron job as your non-root user at or after 9AM UTC.
(External documentation for auto-mass-check script.)
Alternative Methods
The easiest of all methods is to upload your corpora and let us process it for you: UploadedCorpora
The corpus-nightly script is a less maintained alternative to the auto-mass-check script: CorpusNightlyScript
Or you can do it manually: ManualNightlyMassCheck