Rules Project: Promotion of Rules
from the sandboxes to the core ruleset, that is.
(this page split from RulesProjSandboxes, part of RulesProjectPlan)
JustinMason: note that SVN paths are listed as "ROOT/rules/trunk". This is the trunk; by having that, it allows branches of the rules project at e.g. "ROOT/rules/branches/vX.Y.Z", similarly to how the code SVN repo has trunk and branches. (As to what way exactly we'd branch, versions, etc. let's see how that develops in the future.)
Getting rules from the sandbox, into the distribution:
- each user gets their own sandbox as discussed on RulesProjSandboxes
- checked-in rules in the sandboxes are mass-checked in the nightly mass-checks
- to migrate a rule from "sandbox" (dev) to "core" (production) ruleset uses C-T-R; ie. votes are not required in advance
- also C-T-R to migrate from "sandbox" to "extra" ruleset
Rules that get promoted from a "sandbox" to "core" should pass the following criteria:
- pass "--lint"!
- S/O ratio of 0.95 or greater (or 0.05 or less for nice rules)
- > 0.25% of target type hit (e.g. spam for non-nice rules)
- < 1.00% of non-target type hit (e.g. ham for non-nice rules)
These numbers are really just ball-park figures and should be fine-tuned as we go. (DuncanFindlay)
We can automate those criteria pretty easily. We can also vote for rules that don't pass those criteria, but we think should be put into core for some reason.
TODO: we need a tool, probably web-based run from the nightly mass-check results, that measures these criteria and produces a list of the rules that pass.
Future criteria:
- not too slow TODO: need an automated way to measure that
- TODO: criteria for overlap with existing rules? see 'overlap criteria' below.
Moving files out of trunk into the new rules project
Code-tied rules stay with main tree in current rules directory, with the exception of 25_replace.cf which is really just another way to write body/header rules. Basically, the static stuff that is tied to code does not move to the rules project. Everything else moves.
In more detail – files that DO NOT move to rules project:
25_accessdb.cf (plugins in core code) 25_antivirus.cf 25_dcc.cf 25_domainkeys.cf 25_hashcash.cf 25_pyzor.cf 25_razor2.cf 25_spf.cf 25_textcat.cf 25_uribl.cf 60_awl.cf 60_whitelist_subject.cf 20_dnsbl_tests.cf (eval tests in EvalTests.pm) 20_html_tests.cf (rawbody ones can move to ROOT/rules/trunk/core/) 20_net_tests.cf 23_bayes.cf 60_whitelist.cf init.pre (Misc non-cf files) local.cf name-triplets.txt regression_tests.cf triplets.txt user_prefs.template v310.pre
Files that DO get moved:
25_body_tests_es.cf -> ROOT/rules/trunk/core/es/ 25_body_tests_pl.cf -> ROOT/rules/trunk/core/pl/ 30_text_de.cf -> ROOT/rules/trunk/core/de/ 30_text_fr.cf -> ROOT/rules/trunk/core/fr/ 30_text_it.cf -> ROOT/rules/trunk/core/it/ 30_text_nl.cf -> ROOT/rules/trunk/core/nl/ 30_text_pl.cf -> ROOT/rules/trunk/core/pl/ 30_text_pt_br.cf -> ROOT/rules/trunk/core/pt_br/ 20_advance_fee.cf -> ROOT/rules/trunk/core/ 20_drugs.cf -> ROOT/rules/trunk/core/ 20_p**n.cf -> ROOT/rules/trunk/core/ [wikicensorship!] 10_misc.cf -> ROOT/rules/trunk/core/ 20_anti_ratware.cf -> ROOT/rules/trunk/core/ 20_body_tests.cf -> ROOT/rules/trunk/core/ 20_compensate.cf -> ROOT/rules/trunk/core/ 20_fake_helo_tests.cf -> ROOT/rules/trunk/core/ 20_head_tests.cf -> ROOT/rules/trunk/core/ 20_meta_tests.cf -> ROOT/rules/trunk/core/ 20_phrases.cf -> ROOT/rules/trunk/core/ 20_ratware.cf -> ROOT/rules/trunk/core/ 20_uri_tests.cf -> ROOT/rules/trunk/core/ 25_replace.cf -> ROOT/rules/trunk/core/ [code dependent, but these will change a lot] 50_scores.cf -> ROOT/rules/trunk/core/ 60_whitelist_spf.cf -> ROOT/rules/trunk/core/
(update: this is now complete.)
Algorithm for compilation
The ROOT/rules/trunk
svn path is now the rules source directory.
The ROOT/trunk/rules
svn path – ie "rules" in the SpamAssassin source tree – is the rules build output directory.
Rules are compiled from source dir to output dir. All rules in "core" are always promoted (for backwards compatibility). In addition, rules in the sandboxes will be promoted, if the rules source file contains a publish NAME_OF_RULE
command. This command is added (by hand!), one per rule, to the source file by committers, as the rules pass the validation criteria.
Rules will be autorenamed, if there's a collision between a new rule name and one that's already been output by the compiler.
The compiler will copy the rules to the output directory. By default, the filename is preserved; so a rule in a file called "20_foo.cf" in the source directory will be output to the file "20_foo.cf".
'pubfile' is another command to select the name of the output file in the "rules" directory: pubfile NN_filename.cf
, and override that behaviour.
(TODO: linting during compilation, and ignore lint-failures? may have to reimplement a small subset of lint behaviour to do this.)