registering documents other than emails

Chris Fortune cfortune at telus.net
Sat Dec 3 06:19:42 CET 2005


> On Fri, 2 Dec 2005 15:04:50 -0800
> Chris Fortune wrote:
>
> > Hello,
> >
> > I often need to register (as ham) a single large text file that is a mix of emails and other business documents.  Will
bogofilter
> > parse it correctly?  Is there a command line switch to force it to no treat the document as an email?
> >
> > Chris
>
> The short answer is "No", though you might try the "-H" switch and see
> if the results are adequate for your purpose.
>
> In normal operation, bogofilter tags the contents of header lines, for
> example tokens from the "Subject:" line are given a tag (prefix) of
> "subj:".  Thus line "Subject: this is a test" produces tokens
> "subj:this" and "subj:test".  The "-H" switch disables this special
> processing, so the two tokens are "this" and "test".
>
> The easiest way to see how a message (or a document) is parsed is to
> use bogolexer.  The above discussion can be demonstrated with:
>
>   echo "Subject: this is a test" | bogolexer -p
>   echo "Subject: this is a test" | bogolexer -p -H
>
> HTH,
>
> David
>

If there are several emails in one document, will it work the same way (tagging every header line), or will it tag just the first
(or last) headers it finds?   Or will it treat the entire document as if it were one big email (tagging the first headers and
treating the rest as body content)?




More information about the Bogofilter mailing list