Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You will need giftopnm, jpegtopnm and pngtopnm (from netpbm), imagemagick and gocr installed.

Additionally, you will need the perl module

...

  • Several bugfixes
  • New debug system
  • Logfile support
  • Proper error handling for most errors

Version 2.3

  • Multiple scans with different pnm preprocessing and gocr arguments possible
  • Support for interlaced gifs
  • Support for animated gifs
  • Temporary file handling reorganized
  • External wordlist support
  • Personalized wordlist support
  • Spaces are now stripped from wordlist words and OCR results before matching
  • Experimental MD5 Database feature

Installation

Attention: If you need help installing this plugin or have other questions, please use the mailinglist created for this plugin .or contact me on IRC (see the end of this page for more informations)

It can be found at http://lists.own-hero.net/mailman/listinfo/devel-spam

Since version 2.3, the tarball contains an INSTALL file and a FAQ file. Both should be read for instructions installing it.

The following informations are a bit older and might not be accurate anymore for version 2.3. Most new parameters are not mentioned here anymore.

Download the tarball (see How to Obtain) to your spamassassin configuration directory and unpack it to /etc/mail/spamassassin/ (You may choose another location but all necessary adjustments to the configuration file are up to you then). Open FuzzyOcr.cf and extend the wordlist as you wish. If you have the helper binaries in a different location than the default in the config file specifies, then change these to the correct path.

...

Explanation of the additional options:

focr_tmp_path - String determining the absolute path to a directory where the plugin may write temporary files to (without trailing slash)focr_logfile - String determining the file to send log messages to. Make sure this is writable!

...

  • The case is not relevant
  • All special characters, spaces or numbers are stripped before any matching is done
  • Your wordlist word will be found even if it is inside another word (submatching)
  • The distance is calculated from the amount of character additions, deletions and substitutions, that need to be done.

...

  • The words checked for are specific for some spam I received a lot of recently.
  • gocr can take up quite a bit of resources, so be careful. But it is only executed for messages that contain gif, png or jpeg attachments.

ToDo

  • Rework animated gif handling
  • Replace plain MD5 database with a DBM file Avoid usage of tmp files for gocr, redirect output directly back to the script

– Author: Christian Holler, decoder_at_own-hero_dot_net

...