What does X-Spam-Status mean ?

SpamAssassin adds an extra email header such as

X-Spam-Status: Yes, score=21.6 required=4.0 tests=BATMAIL,BAYES_99,
DATE_IN_FUTURE_06_12,FS_REPLICA,FS_REPLICAWATCH,RAZOR2_CF_RANGE_51_100,
RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_PBL,RCVD_IN_SORBS_DUL,
RDNS_DYNAMIC,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL
autolearn=spam version=3.2.1

These are visible when the a mail client is configured to "show full headers", or a similar option.

The fields are the following:

Whether the message is spam (yes/no)
The total score for the message (can be negative if whitelisted)
The score that would be required to be classed as spam
The comma-separated list of tests that returned non-zero value
Whether autolearn learned the message as spam or ham
The version of SpamAssassin that was used

To find out why a message was classified as spam (got the score it did), check the tests listed.

You can find all of the currently active rules with descriptions and scores in the Subversion repository under /trunk/rules or by downloading the latest published set using the sa-update tool.

If a message is classed as spam, by default it is attached to a text message that gives the tests failed, points assigned, and short descriptions

How are the scores assigned?

SA's scores are assigned using a genetic algorithm (GA), to optimise their efficiency and minimise false positives and false negatives. More information can be found on the 'Tests' page. Note that you can help this system by providing statistics on your mail spool.

Some DNS blacklist rules are distributed with scores of 0. These generally request or require payment, and as such are disabled by default. Feel free to enable the lookups, if you've paid for them.

A score of 0 will stop a rule from being run.

Note: Scores for "learn" rules, such as BAYES_*, that rate the probability that a message is spam, are scored using the same method. This can produce "confusing" scores, for instance, that have BAYES_80 with a higher score than BAYES_99. There are a few reasons for this. 1) The GA does not understand that BAYES_* are related to one another, they're seperate rules that need seperate scores. 2) More importantly, the higher the probability from a "learn" rule, the higher likelihood that the message also hit a bunch of other rules. This lets the GA lower the "learn" rule score due to the inevitable false positive, while also still marking the message as spam via the other rule scores.

Child pages

Versions Compared

Old Version 1

New Version Current

Key

What does X-Spam-Status mean ?

How are the scores assigned?

Child pages

Page History

Versions Compared

Old Version 1

New Version Current

Key

What does X-Spam-Status mean ?

How are the scores assigned?