THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
Overview
- lower resource usage: higher throughput and lower memory usage
- higher accuracy: lower FPs and lower FNs (rules, rules, rules... this also includes some notion of speeding up the mass-check process)
- convert optional/non-performance-sensitive code to plugins (I think this is lower priority, but we've often talked about it and it also helps achieve the first goal of lower resource usage)
anti-goals
- features: extra options, non-critical changes not related to the above goals, etc. (except perhaps in plugins)
- option bloat (except perhaps in plugins)
Memory Usage
We should probably evolve some understanding of what we want to convert to plugins. Here's the list mostly based on conversations with Theo, Justin, and Michael:
Performance/Speed
- Predictive autolearn? do check before bayes_check, if we are likely to autolearn, go r/w instead of r/o. Can implement on first bayes_check call.
- Don't bother caching full/decoded/etc at start in PMS. how much caching do we do now? multiple times in PMS? may not be an issue due to references.
- short circuiting ideas:
- set certain rules as SC if hit
- USER_IN_WHITELIST, USER_IN_BLACKLIST (not DEF)
- BSP
- HABEAS
- allow SC on ham score (ie: < #)
- allow SC on spam score (ie: > #)
- should autolearn skip SC msgs? should we always do autolearn in the appropriate direction?
- AWL should be skipped during SC
- SC rules should have a negative priority so they run first
- do *not* do score check per rule, do it either per priority or rule type (header, body, etc.)
- SC will require is_spam SC as score + required_hits will be at odds
- add SC header macro (get_tag)
- SC for S/O 1.000 rules? how about S/O near 1? BAYES_99, etc.
- Some form of order/priority rearrangement:
- Blacklist: short
- Whitelist: user/admin wants it
- BSP/Habeas: reputable, non-forgable
- Other SC Rules: as early as possible
- Other Local Rules: lightweight
- set certain rules as SC if hit
Speed Release Cycle
- Single-cycle mass-check
- Add sample-based "autolearning" to mass-check
- One run with network and bayes turned on
- Related, but non-required change to autolearning: the balancing of in and out (accuracy)
Accuracy Ideas
- network test, do DNS lookups on the HELO (A, NS, and SURBL)
- network test, do DNS lookups on the EnvelopeFrom (SURBL)