Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added link to ManualWhitelist

...

The algorithm works using a local database of entries. Each entry has a key formed by the identificator, and optionally the IP address it originated at, and the DKIM signature. It contains a TOTAL score of messages and a COUNT of messages. The MEAN score is TOTAL/COUNT. Each sender is identified by several IDs: the From email address in combination with the originating IP block (or DKIM signature, or SPF pass, if available), the standalone From email address (without any IP), the domain name of the From address, the full IP address, and the HELO name of the originating client. Each of these ID types has a configurable weight factor when calculating the overall sender's reputation. The overall reputation score is calculated using the formula shown below:

No Format

sender_reputation = txrep_weight_email_ip * email_ip_reputation +
                    txrep_weight_email    * email_reputation    +
                    txrep_weight_domain   * domain_reputation   +
                    txrep_weight_ip       * ip_reputation       +
                    txrep_weight_helo     * helo_reputation

The default values of the weight factors:

  • {{txrep_weight_email_ip = 10 }}(of total 19.5, hence 51%)
  • {{txrep_weight_email = 3 }}(of total 19.5, hence 15%)
  • {{txrep_weight_domain = 2 }}(of total 19.5, hence 10%)
  • {{txrep_weight_ip = 4 }}(of total 19.5, hence 21%)
  • {{txrep_weight_helo = 0.5 }}(of total 19.5, hence 3%)

Depending on configuration, TxRep uses either a global storage to keep the reputation record (same for all users), or a User storage (a separate storage for each user ID that can run SpamAssassin). Alternatively, when the txrep_user2global_ratio is enabled, both storages are used concurrently. When both storages are used, each of the two reputations are calculated in the same way as shown above, using sender values from the respective storages (when available), and ten the overall reputation is calculated with the following formula:

No Format

total reputation = ( txrep_user2global_ratio * user + global ) / ( txrep_user2global_ratio + 1 )

The default value of txrep_user2global_ratio is 0 (dual storage disabled). The setting takes values between 0 and 10. The value around 2 may be a good starting point when enabling the feature (user storage reputation has twice the weight of the global reputation). Before enabling the dual storage, make sure your system is configured to call SpamAssassin under the respective user id. In many installations, SA is always called with the same user id. In such cases, activating the dual storage would be useless.

The overall txrep_factor can be adjusted in the configuration to adjust the impact of the reputation, which may be useful when starting off. The value of the corrective TXREP tag is calculated in the following way:

No Format

 corrected score = current score + txrep_factor * (reputation + current score)/(count+1)
 TXREP tag value = corrected score - current score

The default value of the txrep_factor is 0.5, but unlike at AWL, the final result is also depending on the count of recorded messages of given sender. In the result, the factor of 0.5 is equivalent to the AWL factor of 0.25 at senders with one record, and its influence rises close to the projected value of 0.5 logarithmically with the number of sender messages recorded.

Additionally to the algorithms shown above, the reputation is also influenced by the
txrep_dilution_factor. This factor was introduced to help wearing out the influence of old records. When the factor is used, the new score will always have a slightly higher weight than the stored values. It means that the influence of old records progressively drops with each new message from the sender. The formula below is used:

No Format

newtotal = (oldcount + 1) * (newscore + txrep_dilution_factor  * oldtotal) / (txrep_dilution_factor * oldcount + 1)

The default value of the txrep_dilution_factor is 0.98, and it takes values between 0.7 (fast dilution / expiry), and 1.0 (no dilution at all).

The schema txrep-diagram.gif demonstrates the calculation of the TXREP tag value.

How do I train spam/ham?

In exactly the same way (and in the same time) as you train spam and ham to the Bayesian SA system:

...

How do I whitelist/blacklist someone?

BlacklistingSee ManualWhitelist for different options available in SpamAssassin. With TxRep, the blacklisting/whitelisting can be done manually through with the help of the following command line options of SpamAssassin:

...

It is necessary to understand that whitelisting/blacklisting through TxRep is not the same as whitelisting/blacklisting in a cf file, using the whitelist_from or blacklist_from
directives. TxRep whitelisting/blacklisting adjusts the reputation of the plain email address by a high score (details can be found in TxRep POD). This blacklisted or whitelisted reputation score can wear out over time, as scores of new messages from the sender are added to the total reputation score.

...

Although requested, there is currently no Redis storage handler available for AWL or TxRep, but MySQL storage tuned with the MEMORY engine, or InnoDB engine with a sufficiently big innodb_buffer_pool parameter, or together with the MySQL memcache plugin, would offer similar performance as Redis, while allowing much better vertical and horizontal scalability (it would work better for both bigger tables and multiple concurrent accesses as well).

Contributors