The Bayes database stores up to a certain number of tokens, configured via bayes_expiry_max_db_size
in local.cf
(default: 150000 tokens).
Each token has an access time which records when it last contributed to a classification or appeared in a learned email. A mixture of obsolete (often ephemeral) tokens and the most-infrequently seen tokens are occasionally purged, according to a schedule and algorithm explained in the sa-learn documentation.
Thus, even if you force an expiry run every month, it doesn't mean that you only have a month of data; the most important tokens never get purged.
To view the access time of the oldest token in the database: date -r {{sa-learn --dump magic | grep "oldest atime" | cut -f 3 -w
}}
Wiki Markup |
---|
\[partially adapted from a [post|http://mail-archives.apache.org/mod_mbox/spamassassin-users/201405.mbox/%3C20140517142609.4d1ee700@gumby.homeunix.com%3E] by RW to the spamassassin-users mailing list\] |