We're looking for people to volunteer and make code contributions. Patches, code, perl, regression tests, rules, you get the picture. You'll have to send in a [http://www.apache.org/licenses/#clas Contributor License Agreement] before it can be accepted, but that's easy.
You'll need to download the latest version of SpamAssassin from SVN: DownloadFromSvn
So, what are we looking for right now?
The Top N items
Bug Fixes
Log into [http://bugzilla.spamassassin.org Bugzilla] and look for bugs that you can fix. Fix it, and attach a patch (from latest SVN) to the bug report. See also UsingBugzilla.
Documentation
- The best source of documentation for SpamAssassin is this wiki; unfortunately, most users look for documentation elsewhere - man pages, READMEs, etc. We need people to submit patches to improve the documentation, to improve its accuracy, completeness and clarity. If you know perl, you can also read our code to ensure the documentation agrees with what the code does, or to write man pages for perl modules that don't currently have one (see man perlpod). (We've often had problems with READMEs that document the old behaviour of the code.) Really, anyone can help us with this; if you don't understand perl, simply reading the docs to suggest any improvements is helpful. (DuncanFindlay)
New Rules
We are looking for an individual (or many individuals) to help us write new rules, and help us add rules written by others (i.e. rules from CustomRulesets, with permission of course) into our code base. A good knowledge of SpamAssassin and experience writing rules would be useful. Let us know if you're interested by sending mail to the [http://spamassassin.org/lists.html Spamassassin-dev mailing list]. Generally the developers are focused more on code issues rather than rules, so the more help we get in the rules department means we have more time to devote to writing code. (DuncanFindlay)
Speed
- Submit code to speed something up without breaking anything. Minimum is probably about a 1% speed-up in overall check speed.
Size
AutoWhitelist [http://bugzilla.spamassassin.org/show_bug.cgi?id=3082 bug 3082] and bayes_seen databases need to have automatic expiry.
Bayes accuracy and speed
- Code and corpus tests that for ramping up the probability for previously unseen tokens. This could be done either heuristically or by keeping real counts of unseen tokens in the Bayes token database. The idea is that words that have never been learned before get high probabilities.
- Custom database file and code for faster performance and space savings (probably to be compared against qdbm and tdb since they look most promising right now as non-custom databases).
- Bi-grams: that is, multi-word windowing as used in CRM-114, using two-word tokens (or possibly even higher). Not sure this will provide much higher accuracy now that spammers are using whole-phrase bayes poisoning, though. (JustinMason)
Implementing Dobly noise-reduction - [http://bugzilla.spamassassin.org/show_bug.cgi?id=3078 bug 3078].
Dynamically determining the autolearning thresholds based on incoming email rather than using hard-coded numbers. See [http://bugzilla.spamassassin.org/show_bug.cgi?id=1829 bug 1829] for more.
Looking for specific header tokens when they change location between the original message and the reply. See [http://bugzilla.spamassassin.org/show_bug.cgi?id=2129 bug 2129] for more.
Other ideas
- Translation : translation of rule descriptions, the manual, the website in other languages