registering documents other than emails
relson at osagesoftware.com
Fri Dec 2 20:22:45 EST 2005
On Fri, 2 Dec 2005 15:04:50 -0800
Chris Fortune wrote:
> I often need to register (as ham) a single large text file that is a mix of emails and other business documents. Will bogofilter
> parse it correctly? Is there a command line switch to force it to no treat the document as an email?
The short answer is "No", though you might try the "-H" switch and see
if the results are adequate for your purpose.
In normal operation, bogofilter tags the contents of header lines, for
example tokens from the "Subject:" line are given a tag (prefix) of
"subj:". Thus line "Subject: this is a test" produces tokens
"subj:this" and "subj:test". The "-H" switch disables this special
processing, so the two tokens are "this" and "test".
The easiest way to see how a message (or a document) is parsed is to
use bogolexer. The above discussion can be demonstrated with:
echo "Subject: this is a test" | bogolexer -p
echo "Subject: this is a test" | bogolexer -p -H
More information about the Bogofilter