Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Fixed rsync account wording

...

  1. Send an email to private@spamassassin.apache.org requesting a an rsync account for nightly mass-checks.
  2. Ensure SpamAssassin and its plugins are fully installed.
  3. Download the auto-mass-check script:
    No Format
      git clone git://git.fedorahosted.org/auto-mass-check.git
      
  4. Copy auto-mass-check/auto-mass-check.sh to ~/bin/
  5. Copy auto-mass-check/auto-mass-check.cf to ~/.auto-mass-check.cf
  6. Modify ~/.auto-mass-check.cf to point at your ham and spam folders. Be sure to configure properly to mbox or Maildir. Leave the RSYNC options unchanged for now you will be running auto-mass-check in test mode at first.
  7. Optionally set TRUSTED_NETWORKS and INTERNAL_NETWORKS in ~/.auto-mass-check.cf
  8. Run auto-mass-check.
    • Look in ~/masscheckwork/nightly_mass_check/ for ham-*.log and spam-*.log files. (Or weekly_mass_check on Saturday.)
    • Are the filenames good? They should be named something like ham-username.log or ham-net-username.log.
    • Read CorpusCleaning and HandClassifiedCorpora for guidelines of how to identify ham in your spam folder, and spam in your ham folder, and which messages you should be simply deleted.
    • If you move/delete messages, do not forget to "Compact Folder" to be sure they are actually gone.
    • Repeat auto-mass-check until you are certain both folders are cleaned.
  9. Edit ~/.auto-mass-check.cf and set RSYNC_USERNAME and RSYNC_PASSWORD with values from step 1.
  10. Run auto-mass-check which will upload your results.
  11. Ask a more experienced participant (probably the person who recruited you) to check your results on the server. They can see the uploaded log files by running a command like rsync --old-d username@rsync.spamassassin.org::corpus/
  12. If your upload looks good, then you're probably ready to automate nightly checks. Configure auto-mass-check to run as a cron job as your non-root user at roughly 7AM UTC.

...