Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Nightly MassChecks by Uploading your Corpora


Corpus uploads are not used anymore due to resource and privacy reasons.


How to participate in Nightly MassCheck NightlyMassCheck runs by uploading your email.

...

It is possible to mass-check these corpora nightly ; results also at http://bbmass.spamassassin.org:8011/ .

Details for PMC members on how to set up new accounts are at NewUploadedCorporaUser.

Administrivia: how the corpus is laid out

...

No Format
/home/bbmass/uploadedcorpora/WHO/TYPE/FOLDER

"WHO" is the person who submitted it via rsync, e.g. "doc", "jm", "zmi"your username.

Under that, we have "TYPE", which is either "ham" or "spam".

...

How to get your corpus up there

This is done via rsync.

Give somebody on the PMC a shout, since they have privileges to create an rsync area for you to upload stuff to. The easiest way is to mail the dev list. (If you're on the PMC, just SSH in and copy over a tarball yourself! or create yourself an rsync account using a random password.)

Send an email to private@spamassassin.apache.org requesting an rsync account for uploading corpora.

They'll send you a username and password. You Once they've done this, they'll send you the username and password; you can then sync your files like so:

No Format
  export RSYNC_PASSWORD=$YOURPASS
  rsync -vr /path/to/your/files \
      rsync://$YOURUSER@rsync.spamassassin.org/mailcorpus_$YOURUSER

(where $YOURPASS, $YOURUSER, $YOU are whatever the PMC guy mailed to you.)

It's important that you have 2 dirs in the /path/to/your/files directory,
ham and spam. Any files ending in .mbox inside those dirs will be treated as UNIX mbox-format files; any other files will be treated as individual messages (one message per file).

...

Uploaded corpora are not considered public knowledge. The people with accounts on that machine should treat the uploaded messages responsibly, and respect the uploader's privacy. If you are concerned about the privacy of these messages, you may be advised to remove the more private mails before uploading, or mass-check on your own machine instead.

...

How we create a new rsync area for someone to upload corpora

...

Some stuff for PMC people hacking on this...

...

No Format
CORPUSUSER="[username you want to give out]"
cd /home/bbmass/uploadedcorpora/
mkdir $CORPUSUSER
chmod 1777 $CORPUSUSER
#PERMISSIONS CHANGED ON NEW SPAMASSASSIN-VM BOX
chown rsync.rsync $CORPUSUSER

Then create a random password string, and add a line to /home/corpus-rsync/secrets with $CORPUSUSER and that password.

...