Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: converted to 1.6 markup

Spam Filter Batting Average

Wiki Markup\[http://jgc.org/ John Graham-Cumming\] proposed this uniform measure of spam-filter effectiveness in \[http://www.jgc.org/antispam/11162004-baafcd719ec31936296c1fb3d74d2cbd.pdf his November 16, 2004 article entitled 'Understanding Spam Filter Accuracy'\].

Essentially, it's a reformatting of the FalsePositive percentage and FalseNegative percentage, as 'spam hit rate / ham strike rate'. This can be computed from FP%/FN% as follows:

No Format

  let fp = false positive percentage

...


  let fn = false negative percentage

...


  batting average hitrate = (1 - (fn / 100))

...


  batting average strikerate = (fp / 100)

...


  batting average = "hitrate/strikerate"

so if you have an FP% of 0.1%03%, and an FN% of 2.5%47%, the batting average is

No Format

  (1 - (2.

...

47 / 100)) "/" (0.

...

03 / 100) =
  .9753/.0003

.975/.001That's actually the correct batting average for SpamAssassin 3.0.0's scoreset 3, measured against the validation corpus when we released it. (wink)

See also MeasuringAccuracy for other schemes used, or FpFnPercentages for the main one we use in SpamAssassin.