registering documents other than emails

David Relson relson at osagesoftware.com
Sat Dec 3 02:22:45 CET 2005


On Fri, 2 Dec 2005 15:04:50 -0800
Chris Fortune wrote:

> Hello,
> 
> I often need to register (as ham) a single large text file that is a mix of emails and other business documents.  Will bogofilter
> parse it correctly?  Is there a command line switch to force it to no treat the document as an email?
> 
> Chris

The short answer is "No", though you might try the "-H" switch and see
if the results are adequate for your purpose.

In normal operation, bogofilter tags the contents of header lines, for
example tokens from the "Subject:" line are given a tag (prefix) of
"subj:".  Thus line "Subject: this is a test" produces tokens
"subj:this" and "subj:test".  The "-H" switch disables this special
processing, so the two tokens are "this" and "test".

The easiest way to see how a message (or a document) is parsed is to
use bogolexer.  The above discussion can be demonstrated with:

  echo "Subject: this is a test" | bogolexer -p
  echo "Subject: this is a test" | bogolexer -p -H

HTH,

David



More information about the Bogofilter mailing list