Setting up Site-Wide Bayesian Filtering
In local.cf, tell SpamAssassin where to find the Bayesian database files:
bayes_path /etc/mail/spamassassin/bayes
This tells the system that the Bayesian filter database files will be /etc/mail/spamassassin/bayes_msgcount, _seen and _toks. Feel free to move it wherever you want.
Now start feeding the Bayesian filter spam and ham messages. Tell sa-learn to use /etc/mail/spamassassin as the configuration directory (i.e. where to find the bayes_msgcount, _seen and _toks files):
sa-learn --spam -C /etc/mail/spamassassin --showdots --dir /path/to/directory/full/of/spam/msgs sa-learn --ham -C /etc/mail/spamassassin --showdots --dir /path/to/directory/full/of/ham/msgs
Also restart spamd if you're running it already so that it will re-read local.cf and enable the Bayes filter:
ps axo %p%a | awk '/spamd/ { print $1 }' spamd -x -q -d -L -u nobody
(your spamd options may be different than mine)