Setting up Site-Wide Bayesian Filtering
In local.cf, tell SpamAssassin where to find the Bayesian database files:
bayes_path /etc/mail/spamassassin/bayes
This tells the system that the Bayesian filter database files will be /etc/mail/spamassassin/bayes_msgcount, _seen and _toks. Feel free to move it wherever you want.
Now start feeding the Bayesian filter spam and ham messages. Tell sa-learn to use /etc/mail/spamassassin as the configuration directory (i.e. where to find the bayes_msgcount, _seen and _toks files):
sa-learn --spam -C /etc/mail/spamassassin --showdots --dir /path/to/directory/full/of/spam/msgs sa-learn --ham -C /etc/mail/spamassassin --showdots --dir /path/to/directory/full/of/ham/msgs
See SiteWideBayesFeedback for more tips on getting an entire site to feed back spam and ham messages into the Bayesian filter. Just use -C to make sure that the correct database files are used.
Also restart spamd if you're running it already so that it will re-read local.cf and enable the Bayes filter:
ps axo %p%a | awk '/spamd/ { print $1 }' spamd -x -q -d -L -u nobody
(your spamd options may be different than mine)
You may experience difficulties with permissions. Make sure you chmod your bayes files to readable/writable by your user group.
If you are running spamd in setuid mode (setuid's to the user who ran spamc), you will probably need to set bayes_file_mode in local.cf. Otherwise, the bayes file permissions will default to 0700 when the first caller causes updates, and subsequent callers will lack the permissions to open these file.
In local.cf (your setttings may vary):
bayes_file_mode 0770
See Mail::SpamAssassin::Conf(3) for details.