Here's some advanced information on what happens behind the scenes with body/rawbody rulesof different rule types.
Body rules: body RULENAME /foo/
...
- Pattern /clause/ would result in 5 rule hits.
- Pattern /^./ would result in 3 rule hits.
When using rules with extended characters / diacritics, you should always use both ISO-8859-1 / UTF-8 encodings.
Body content can be different depending on normalize_charset setting. If matching "fügen", see these examples:
- body FOO /fügen/ (BAD)
- Does not work when normalize_charset 1 and mail is converted from ISO-8859-1 to UTF-8
- body FOO /fÃŒgen/ (BAD)
- Does not work when normalize_charset 0 and mail is ISO-8859-1
- body FOO /f(?:\xfc|\xc3\xbc)gen/
- Works for both encodings, and file is also now very portable and not encoding dependent
- You can use UTF-8 / ascii table tools found with google, or try perl for hex convert:
- perl -MEncode -e 'print unpack("H*",encode("UTF-8","ü"))."\n"'
- perl -e 'print unpack("H*","ü")."\n"'
- You can also try some replace_tags found in default ruleset, that match different variations:
- body FOO /f<U>gen/
- replace_tags FOO
As body is processed in raw bytes, Unicode-regex features like \p{} can not currently be used.
Rawbody rules: rawbody RULENAME /foo/
...
- When using anchoring (/^foo/), it will only match the start of a chunk.
- I.e. it's not possible to match a beginning of part 100% accurately, if it's larger than 1-4kB.
Header rules: header RULENAME Header =~ /foo/
If there are multiple headers named "Header", the matched string contains each of the headers, newline separated, starting from first (topmost).
- If Header:raw is used, all whitespace and newlines are preserved. Again multiple headers are concatted in the same matching string.
- When using anchors (/^foo/), use m-modifier if any of the duplicate Headers should match. Without, only the first header (line) will match.