Ideas for Google Summer of Code 2006
These are possibilities, and won't be considered until they're listed in the real page at http://wiki.apache.org/general/SummerOfCode2006 . Anyone who's planning to mentor for these will also need to sign up there too.
Task Proposals
Subject ID: spamassassin-easy-mass-check
Title: Nightly Mass Check for Normal People
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: perl, email, corpora, distributed, community
Description:
We need a way to make nightly mass check easily accessible to normal users. They need easy to use software to do mass checks and submit results. They must be properly trained on the sorting rules. Our project then needs some way of tracking the level of trust of these growing number of submitters. see SocNightlyMassCheck
Possible Mentors: Justin Mason (jm at jmason.org)
Status: -
Subject ID: spamassassin-persistent-db-conns
Title: Persistent database connections
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: perl, databases, sql
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2037 :
persistent database connections for SpamAssassin's Bayes subsystem. Michael:
'This exists, but is not an ASL friendly license. So a "clean room" implementation might be cool.'
Possible Mentors: -
Status: -
Subject ID: spamassassin-separate-expiry
Title: Helper process for Bayes expiry
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: perl, bayes, spamd, processes
Description:
Theo said: 'I also suggested having things like Bayes expiry and such being passed back to the parent who can spawn a helper process to do the work. That way the children processes will be able to accept, process, return the result, notify parent for bayes work, go back to listening. Right now we do: accept, process, do bayes work, return result, go back to listening, which ends up causing timeouts and possibly eats up all processing children.'
Possible Mentors: -
Status: -
Subject ID: spamassassin-arf-plugin
Title: ARF plugin
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: arf, plugins, reporting, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4812 :
ARF is a spam-report format for feedback loops for ISPs; there's been some interest in SpamAssassin understanding this and being able to match metadata inside the messages being reported.
Possible Mentors: -
Status: -
Subject ID: spamassassin-httpd-spamd
Title: Finish up Apache::SpamD httpd module
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: apache, httpd, modules, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4603 :
Finish up and polish the Apache::SpamD httpd module.
Possible Mentors: -
Status: -
Subject ID: spamassassin-quarantine-config-ui
Title: Quarantine / user-configuration web UI
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: web, ui, quarantine, user-configuration, cgi, perl
Description:
Create a web application for message quarantine or user configuration, as part of the SpamAssassin project.
Possible Mentors: -
Status: -
Subject ID: spamassassin-corpus
Title: Maintain a SpamAssassin corpus of messages
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: corpora, mail, collection, perl
Description:
Theo said: 'I'd almost rather we shift this around and make a "SpamAssassin Corpora", have all of us focus on making that the best it can be, and use that for mass-checks, etc.'
This could be a good possibility. Contributors can upload their own mail corpora to a central web app where the mass-check occurs. The mail collections could be quickly checked for validity, and tagged based on how much privacy the user wants for their mails (therefore controlling further redistribution of those mails).
Related to 'spamassassin-easy-mass-check' above.
Possible Mentors: -
Status: -
Subject ID: spamassassin-rules-db
Title: Rules explanation database
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: rules, wiki, web, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4771 - It'd be nice to have a way for users/admins/interested parties to have an easy way to look up a human-readable description of rules, based on the rule name; using the wiki as part of that would be the best solution.
Possible Mentors: -
Status: -
Subject ID: spamassassin-better-reload
Title: Better way to reload the spamd configuration
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: reload, spamd, sighup, restart, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4774 :
we currently have a very heavyweight configuration-rereading system where the entire process restarts. This is too heavyweight, and can be improved.
Possible Mentors: -
Status: -
Subject ID: spamassassin-message-test-suite
Title: a message-parser test suite
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: testing, testsuite, parsing, mail, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4559 : Every now and again, we come up against bugs in our message parser (MIME, HTML, headers, base64/qp decoding, etc. etc.) We fix them, but occasionally there's regressions. I envisage it as using a vast collection of message files, something like a mass-check corpus, and a set of tests to ensure the parser sees what it should be seeing.
Possible Mentors: -
Status: -
Subject ID: spamassassin-reduce-memory-usage
Title: Reduce memory footprint of spamd
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: ram, memory, spamd, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3839 : it may be possible to reduce spamd's memory footprint through changes to SpamAssassin's engine, internally. A risky project; major internal changes may never get applied, esp if they break other stuff
Possible Mentors: -
Status: -
Subject ID: spamassassin-improved-chi
Title: Implement 'Improved Chi' in the BAYES rules
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: bayes, chi, robinson, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3460 :
Handling Redundancy in Email Token Probabilities, Gary Robinson. http://www.garyrobinson.net/2004/04/improved_chi.html . Has shown good results. Implement in SpamAssassin and benchmark.
Possible Mentors: -
Status: -
Subject ID: spamassassin-spamd-unix-and-tcp
Title: spamd should support both UNIX domain and TCP sockets
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: spamd, unix-domain, sockets, networking, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3991 : currently spamd supports either UNIX-domain or TCP/IP sockets for incoming scan requests. It should support both simultaneously, in the one set of daemon processes.
Possible Mentors: -
Status: -
Subject ID: spamassassin-dobly
Title: Benchmark and implement "Dobly" Noise Reduction
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: dobly, bayes, classifiers, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3078 : investigate "Dobly" noise reduction a la http://bnr.nuclearelephant.com/ , in a form that can be incorporated into SpamAssassin. Benchmark results using 10-fold cross-validation.
Possible Mentors: -
Status: -
Subject ID: spamassassin-secure-user-auth
Title: Secure user authentication in the spamd protocol
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: spamd, protocol, tls, perl
Description:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4550 :
a secure method to authenticate users over a spamc/spamd connection.
Possible Mentors: -
Status: -
Template
Please use this for further project suggestions...
Subject ID: spamassassin-xxxx
Title: xxxxx
ASF Project: SpamAssassin - http://SpamAssassin.apache.org/
Keywords: xxxx, perl
Description:
xxxx
Possible Mentors: -
Status: -