Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: further clarify

...

A good rule has a very extreme S/O (near as possible to 1.0 or 0.0) and a high percentage of hits in the correct category. In other words, RCVD_IN_OPM_HTTP is a very good rule in the example above, because it hits 5.2028% of all spam mails without hitting any ham at all (no false positives).

---- (warning) Edit conflict - other version: ----
S/O stands for "spam% / overall%", in other words, the proportion of the total hits that were spam messages. As such, it is equivalent to Bayesian probability, or Positive Predictive Value in bioinformatics or medicine.

---- (warning) Edit conflict - your version: ----
S/O stands for "spam / overall" for which the formula is "spam% / (ham% + spam%)", in other words, the proportion of the total hits that were spam messages. As such, it is equivalent to Bayesian probability, or Positive Predictive Value in bioinformatics or medicine.

---- (warning) End of edit conflict ----

Measuring Rule Overlap

There's one more tool to determine how much 2 rules overlap with each other – "overlap". This is occasionally useful if you suspect that two rules are redundant, checking the same data or hitting exactly the same messages as each other. Take a look at the comments at the top of the "masses/overlap" script for details on how to run this against one or more "mass-check" output log files.

...