Rules Project: Streamlining the rules process
(part of RulesProjectPlan)
Problem description: 'People that do write rules for their own use are not willing to go through the fairly elaborate process in order to submit them to SpamAssassin (this currently requires rules to go through bugzilla and then through 70_testing.cf and eventually into our distribution). What can we do to make this process easier and more inviting?'
First off, the sandboxes idea greatly increases the number of people who can check rules into SVN. Secondly, the barriers to entry for getting a sandboxes account are much lower.
Some bulletpoints from discussion, needs expanding:
sandbox:
- each user gets their own sandbox as discussed on RulesProjMoreInput
- checked-in rules in the sandboxes are mass-checked in the nightly mass-checks
- to migrate a rule from "sandbox" (dev) to "core" (production) ruleset uses C-T-R; ie. votes are not required in advance
- C-T-R to migrate from "sandbox" to "extra" ruleset
Rules that get promoted from a "sandbox" to "core" should pass the following criteria:
- S/O ratio of 0.95 or greater (or 0.05 or less for nice rules)
- > 0.25% of target type hit (e.g. spam for non-nice rules)
- < 1.00% of non-target type hit (e.g. ham for non-nice rules)
- not too slow
- TODO: criteria for overlap with existing rules? BobMenschel: The method I used for weeding out SARE rules that overlapped 3.0.0 rules, was to run a full mass-check with overlap analysis, and throw away anything where the overlap is less than 50%. Manually reviewing the remaining (significantly) overlapping rules was fairly easy. The command I use is: perl ./overlap ../rules/tested/$testfile.ham.log ../rules/tested/$testfile.spam.log | grep -v mid= | awk ' NR == 1 { print } ; $2 + 0 == 1.000 && $3 + 0 >= 0.500 { print } ' >../rules/tested/$testfile.overlap.out
A ruleset in the "extra" set would have different criteria.
We can also vote for extraordinary stuff that doesn't fit into those criteria...
private list for mass-checks:
- archives delayed 1 month?
- moderated signups
- automated mass-checks of attachments in specific file format
- rules considered suitable for use are checked into the "sandbox" area for a quick nightly-mass-check, for release