The SpamAssassin Challenge
(THIS IS A DRAFT; see \[http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376 bug 5376 for discussion\]) Wiki Markup
The \[http://www.netflixprize.com/ Netflix Prize\] is a machine-learning challenge from Netflix which 'seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences.' Wiki Markup
We in SpamAssassin have similar problems; maybe we can solve them in a similar way. We have:
...
Input: the test data: mass-check logs
...
We will take the [SpamAssassin] 3.2.0 mass-check logs, and split them into test and training sets; 90% for training, 10% for testing, is traditional. Any cleanups that we had to do during \[http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270 bug 5270\] are re-applied.
The test set is saved, and not published.
...