Rescore Mass-Check

This is the procedure we use to generate new scores. It takes quite a while and is labour-intensive, so we do it infrequently.

1. heads-up

Inform everyone in advance on the -users and -dev lists that we will be starting mass-checks shortly, and they should get their corpora nice and clean.

2. announce mass-check run 1 (score sets 0 and 1)

See MassCheck. The mass-check for both scoresets can be done in one command, e.g.

  cd masses
  echo "use_bayes 0" > spamassassin/user_prefs
  mass-check --net [targets etc]

We then take the log files rsync'd up to the server, and use those logs for both set 0 and set 1; set 0 can be generated from set 1 by stripping out the network tests.

3. allow several days to complete

Provide enough time, including a weekend if possible, giving people enough time to get around to running it given that they may be busy with day-job stuff

4. generate scores for score sets 0 and 1

See RunningPerceptron.

Once this is complete, update rules/50_scores.cf with the generated scores.

5. announce mass-check run 2 (set 2) and run 3 (set 3)

See MassCheck. Because set 2 and set 3 both require scores from set 0 and set 1, and both depend on auto-learning, they have to be run separately.

Scoreset 2:

  cd masses
  echo "use_bayes 1" > spamassassin/user_prefs
  mass-check [targets etc]

Scoreset 3:

  cd masses
  echo "use_bayes 1" > spamassassin/user_prefs
  mass-check --net [targets etc]

6. wait for everyone to complete them, as per step #3.

Waiting...

7. generate scores for score sets 2 and 3