Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: details about the score calculation

...

The algorithm works using a local database of entries. Each entry has a key formed by the identificator, and optionally the IP address it originated at, and the DKIM signature. It contains a TOTAL score of messages and a COUNT of messages. The MEAN score is TOTAL/COUNT. Each sender is identified by several IDs: the From email address in combination with the originating IP block (or DKIM signature, or SPF pass, if available), the standalone From email address (without any IP), the domain name of the From address, the full IP address, and the HELO name of the originating client. Each of these ID types has a configurable weight factor when calculating the overall sender's reputation. The overall reputation score is calculated using the formula shown below:

No Format

sender_reputation = txrep_weight_email_ip * email_ip_reputation +
                    txrep_weight_email    * email_reputation    +
                    txrep_weight_domain   * domain_reputation   +
                    txrep_weight_ip       * ip_reputation       +
                    txrep_weight_helo     * helo_reputation

Depending on configuration, TxRep uses either a global storage to keep the reputation record (same for all users), or a User storage (a separate storage for each user ID that can run SpamAssassin). Alternatively, when the txrep_user2global_ratio is enabled, both storages are used concurrently. When both storages are used, each of the two reputations are calculated in the same way as shown above, using sender values from the respective storages (when available), and ten the overall reputation is calculated with the following formula:

No Format

total reputation = ( txrep_user2global_ratio * user + global ) / ( txrep_user2global_ratio + 1 )

The overall txrep_factor can be adjusted in the configuration to adjust the impact of the reputation, which may be useful when starting off. The value of the corrective TXREP tag is calculated in the following way:

No Format

 corrected score = current score + txrep_factor * (reputation + current score)/(count+1)
 TXREP tag value = corrected score - current score

Additionally to the algorithms shown above, the reputation is also influenced by the
txrep_dilution_factor. This factor was introduced to help wearing out the influence of old records. When the factor is used, the new score will always have a slightly higher weight than the stored values. It means that the influence of old records progressively drops with each new message from the sender. The formula below is used:

No Format

newtotal = (oldcount + 1) * (newscore + txrep_dilution_factor  * oldtotal) / (txrep_dilution_factor * oldcount + 1)

The schema txrep-diagram.gif demonstrates the calculation of the TXREP tag value.

How do I train spam/ham?

...