Rescore Mass-Check
This is the procedure we use to generate new scores. It takes quite a while and is labour-intensive, so we do it infrequently.
1. heads-up
Inform everyone in advance on the -users and -dev lists that we will be starting mass-checks shortly, and they should get their corpora nice and clean.
2. announce mass-check run 1 (score sets 0 and 1)
See MassCheck. The mass-check for both scoresets can be done in one command, e.g.
cd masses echo "use_bayes 0" > spamassassin/user_prefs mass-check --net [targets etc]
We then take the log files rsync'd up to the server, and use those logs for both set 0 and set 1; set 0 can be generated from set 1 by stripping out the network tests.
3. allow several days to complete
Provide enough time, including a weekend if possible, giving people enough time to get around to running it given that they may be busy with day-job stuff
4. generate scores for score sets 0 and 1
See RunningPerceptron.
Once this is complete, update rules/50_scores.cf with the generated scores.
5. announce mass-check run 2 (set 2) and run 3 (set 3)
See MassCheck. Because set 2 and set 3 both require scores from set 0 and set 1, and both depend on auto-learning, they have to be run separately.
Scoreset 2:
cd masses echo "use_bayes 1" > spamassassin/user_prefs mass-check [targets etc]
Scoreset 3:
cd masses echo "use_bayes 1" > spamassassin/user_prefs mass-check --net [targets etc]
6. wait for everyone to complete them, as per step #3.
Waiting...
7. generate scores for score sets 2 and 3
See RunningPerceptron.