ReleaseGoals

Overview

lower resource usage: higher throughput and lower memory usage
higher accuracy: lower FPs and lower FNs (rules, rules, rules... this also includes some notion of speeding up the mass-check process)
convert optional/non-performance-sensitive code to plugins (I think this is lower priority, but we've often talked about it and it also helps achieve the first goal of lower resource usage)

anti-goals

features: extra options, non-critical changes not related to the above goals, etc. (except perhaps in plugins)
option bloat (except perhaps in plugins)

Memory Usage

We should probably evolve some understanding of what we want to convert to plugins. Here's the list mostly based on conversations with Theo, Justin, and Michael:

Razor
DCC
Pyzor
SpamCop reporting
nuke AWL and replace with "History" plugin
TextCat

Performance/Speed

Predictive autolearn? do check before bayes_check, if we are likely to autolearn, go r/w instead of r/o. Can implement on first bayes_check call.
Don't bother caching full/decoded/etc at start in PMS. how much caching do we do now? multiple times in PMS? may not be an issue due to references.
short circuiting ideas:
- set certain rules as SC if hit
  - USER_IN_WHITELIST, USER_IN_BLACKLIST (not DEF)
  - BSP
  - HABEAS
- allow SC on ham score (ie: < #)
- allow SC on spam score (ie: > #)
- should autolearn skip SC msgs? should we always do autolearn in the appropriate direction?
- AWL should be skipped during SC
- SC rules should have a negative priority so they run first
- do *not* do score check per rule, do it either per priority or rule type (header, body, etc.)
- SC will require is_spam SC as score + required_hits will be at odds
- add SC header macro (get_tag)
- SC for S/O 1.000 rules? how about S/O near 1? BAYES_99, etc.
- Some form of order/priority rearrangement:
  - Blacklist: short
  - Whitelist: user/admin wants it
  - BSP/Habeas: reputable, non-forgable
  - Other SC Rules: as early as possible
  - Other Local Rules: lightweight

Speed Release Cycle

Single-cycle mass-check
- Add sample-based "autolearning" to mass-check
- One run with network and bayes turned on
- Related, but non-required change to autolearning: the balancing of in and out (accuracy)

Accuracy Ideas

network test, do DNS lookups on the HELO (A, NS, and SURBL)
network test, do DNS lookups on the EnvelopeFrom (SURBL)

Child pages

Overview

anti-goals

Memory Usage

Performance/Speed

Speed Release Cycle

Accuracy Ideas

Uncategorized