Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason] use an easier process for stats file generation

...

These can be pretty big (although nowadays the scripts using hard links for the duplicate logfiles, which saves a lot of space).

Also, check in the "config" files you used for each scoreset:

No Format

  svn commit -m "runGA config files used" masses/config.set*

6. upload the test logs to zone

...

No Format
  svn revert rules/50_scores.cf
  wget -o newscores.diff http://bugzilla.spamassassin.org/....attachment?id=....
  patch -p0 < newscores.diff

then, a little configuration; replace these with the paths to the correct gen-setN-* directories for the 4 score sets... the test logs the stats are measured against will be taken from these directories. NOTE: don't cut and paste these! they will be different for your runs.

No Format

  genset0=/home/corpus-rsync/corpus/scoregen-3.1/gen-set0-2.0-4.0-100-nobob
  genset1=/home/corpus-rsync/corpus/scoregen-3.1/gen-set1-2.0-4.0-100-nobob
  genset2=/home/corpus-rsync/corpus/scoregen-3.1/gen-set2-2.0-4.625-100-nobob
  genset3=/home/corpus-rsync/corpus/scoregen-3.1/gen-set3-2.0-5.0-100-nobob

Once those vars are set, run Run these commands:

No Format
  cd masses

  rmcp ham*config.log spam*.log ; touch ham.log spam.log
  ln -s $genset0/NSBASE/ham-test.log ham-test.log
  ln -s $genset0/SPBASE/spam-test.log spam-test.log
  set0 config ; bash ./mk-baseline-results 0 > ../rules/STATISTICS-set0.txt

  rm ham*.log spam*.log ; touch ham.log spam.log
  ln -s $genset1/NSBASE/ham-test.log ham-test.log
  ln -s $genset1/SPBASE/spam-test.log spam-test.log
  bash ./mk-baseline-results 1 > ../rules/STATISTICS-set1.txt

  rm ham*.log spam*.log ; touch ham.log spam.log
  ln -s $genset2/NSBASE/ham-test.log ham-test.log
  ln -s $genset2/SPBASE/spam-test.log spam-test.log
  bash ./mk-baseline-results 2 > ../rules/STATISTICS-set2.txt

  rm ham*.log spam*.log ; touch ham.log spam.log
  ln -s $genset3/NSBASE/ham-test.log ham-test.log
  ln -s $genset3/SPBASE/spam-test.log spam-test.log
  bash ./mk-baseline-results 3 > ../rules/STATISTICS-set3.txtrunGA stats
  cp config.set1 config ; bash ./runGA stats
  cp config.set2 config ; bash ./runGA stats
  cp config.set3 config ; bash ./runGA stats

There'll be a lot of output along these lines:

No Format
ignoring 'TO_ADDRESS_EQ_REAL': immutable and score == 0

But that can be ignored. (TODO: it'd be nice to make this step a little less labour-intensive.)

8. upload new stats files

...

And let all and sundry vote on that, too (or just check it in depending on whether you're in R-T-C or not). Once the new scores and STATS files are approved and into SVN, and the log data is in a safe archival spot on the zone, the bugzilla bug notes that location, and the "config" files are checked in, you're done.