10-Fold Cross Validation
This is a log of what I did to run a 10-fold cross-validation test of the perceptron vs the GA when testing bug 2910 ( http://bugzilla.spamassassin.org/show_bug.cgi?id=2910 ) – JustinMason 21/01/04
Wiki Markup |
---|
\[check it out:\] |
No Format |
---|
svn co https://svn.apache.org/repos/asf/incubator/spamassassin/trunk |
...
...
...
Wiki Markup |
---|
\[also get pgapack and install as "masses/pgapack". I just scp'd in an already-built tree I had here.\] |
Wiki Markup |
---|
\[and use the set-0 logs from the 2.60 GA run -- taken from the rsync repository:\] |
No Format |
---|
wc -l /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/ham-set0.log /home/ |
...
corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/spam-set0.log |
...
210442 /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/ham-set0.log |
...
354479 /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/spam-set0.log
|
Wiki Markup |
---|
\[we want about 2k in each bucket, otherwise it'll take weeks to complete. use split-logs-into-buckets to juggle the log files in blocks of 10% to get the ratio and size to around 2k:2k.\] |
...
mv split-*.log ../../logs/spam-jm/
}}}
Wiki Markup |
---|
\[and doublecheck the log sizes:\] |
...
Wiki Markup |
---|
\[looks fine. now run the 10pass master script.\] |
No Format |
---|
nohup sh -x ./tenpass/10pass-run &
|
Results will appear in "tenpass_results" – over the course of 4 days.
...
No Format |
---|
make clean >> make.output
make -C perceptron_c clean >> make.output
make tmp/tests.h >> make.output 2>&1
rm -rf perceptron_c/tmp; cp -r tmp perceptron_c/tmp
make -C perceptron_c >> make.output
( cd perceptron_c ; ./perceptron )
pwd; date
|
Change
No Format |
---|
cp craig-evolve.scores tenpass_results/scores.$id
|
to
No Format |
---|
cp perceptron_c/perceptron.scores tenpass_results/scores.$id
|
and run ./10pass-run-perceptron . This one runs quicker