Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

(DRAFT - this part of the wiki is a discussion document, based on emails to dev list. Please feel free to add comments, but be sure to make clear that it's your opinion, by signing your name to them. Your real name is preferred, btw.)

The Problem

Here it is, stated by DuncanFindlay: 'SpamAssassin is not as effective as it could be because of the rules that are being used to detect spam. There are two problems here:

  1. The "not enough rules" problem: SpamAssassin does not have enough high quality spam-catching rules. Anecdotally, our FN ratio seems to be much higher with 3.1 than with 3.0 (we won't know for sure until the mass-checks are done). There may be a variety of reasons for this:
    • The SpamAssassin committers are not spending much time writing rules. Attempts to recruit people to become committers to write rules have been somewhat unsuccessful. We could always use more committers and contributors; what can we do to encourage more contribution?
    • People that do write rules for their own use are not willing to go through the fairly elaborate process in order to submit them to SpamAssassin (this currently requires rules to go through bugzilla and then through 70_testing.cf and eventually into our distribution). What can we do to make this process easier and more inviting?

2. The "release cycle" problem: Any high quality rules that are incorporated into SpamAssassin are not distributed until the next release. Since rules and code are tied together, the release cycle for rules is too long. Submitted rules are not distributed while they are most effective, and rules lose their effectiveness too quickly.'

...

Repository Organization

  • rules/core/ = standard rules directory
  • rules/sandbox/<username>/ = per-user sandboxes
  • rules/extra/<directory>/ = extra rule sets not in core

The proposal is for rules/core to become the rules directory for trunk (3.2 and later, via SVN externals which will make their inclusion seamless in the standard SA tree). The sandbox is discussed further in RulesProjMoreInput.

Extras/

We'll want to discuss the structure and process behind creating new extras directories further once we reach a critical mass of committers in the rules project; but here's some initial thoughts on typical 'extra' rulesets.

Outstanding Tasks/Votes

Here's a list of the tasks that have fallen out of the above plan so far... we now need to vote to go forward with these, then put them into action.

First step – the sandboxes:

  • PMC vote to approve the sandboxes project (RulesProjSandboxes).
    • VOTE: passed!
    • Done
  • reorganise the rules directory into core/ , sandbox/, and extra/; link that rules project SVN repository to 3.2.0's 'rules' dir; use SVN externals to do this.
    • VOTE: passed
    • Done
  • move current ruleset into a new "core" area
    • Done
  • write scripts to test, filter, and pull rules from sandboxes automatically into core/ production ruleset
    • Dropped in favour of:
  • write scripts to test, filter, and pull rules from sandboxes and core, as a compilation step, into an output directory (see RulesProjPromotion)
    • Done
  • start using the above scripts to generate ruleset in svn
    • Done!

Phase two – mass-checking systems:

  • Weekly mass-check
    • DONE: all rules, with --net
  • Nightly mass-check: web-based user interface for the following data:
    • DONE: freqs for all rules
    • DONE: freqs collated across all users' corpora, or individually
    • DONE: overlaps between rules
    • DONE: historical rule hits data?
    • DONE: rule-by-rule comparative performance figures?
    • DONE: promotion criteria as defined in RulesProjPromotion, so rules that can be promoted can be identified at a glance

Phase three:

  • DONE: the RulesProjBuildBot system, comprising these tasks:
    • _DONE: set up new buildbot master - http://buildbot.spamassassin.org:8011/_
    • DONE: set up user in zone
    • DONE: set up new buildbot slave
    • DONE: set up chroot jail
    • DONE: get mass-check running in chroot
    • DONE: copy in corpus
    • DONE: set up additional slaves for additional corpora
    • DONE: write mass-check wrapper script to:
      • DONE: mass-check that corpus, using rules from .../rulesrc/sandbox/ only
      • DONE: implement strict ulimits
    • DONE: write mass-check-completed script to:
      • DONE: output freqs so it's visible through the Buildbot UI
    • DONE: write mail-handling script to extract mail-submitted rule attachments (PreflightByMail)

Loose-ends-tying-up:

  • TODO: still need to nail down promotion criteria, in particular performance figures
  • TODO: bug fixing as they crop up
  • 'Aggressive' rulesets, which are too likely to produce FPs for the base release
  • non-spam-oriented rules, such as the anti-virus-bounce ruleset
  • non-English-language rulesets (although see RulesNotEnglish)