truncating input?

David Relson relson at osagesoftware.com
Thu Jul 29 16:52:13 CEST 2004


On Thu, 29 Jul 2004 01:23:26 -0700
Chris Fortune wrote:

> I seem to remember that bogofilter only looks at the first x number of
> characters (or was it lines?) in an email.   Is this true of both
> registration and classification?   I'm thinking of a simple
> optimization of truncating mail in the training corpus to save space,
> cpu cycles and  bandwidth.

Hi Chris,

Your memory is faulty:-(

Bogofilter looks at the whole message, typically presented on stdin.  It
processes both header and body, including multi-part mime sections. 
Attachments like tgz files and images are assumed to be binary and are
ignored.

If space and time are considerations, you could "head" to trim the
message.  Remember that giving bogofilter less text to work with will
lower its accuracy.

HTH,

David



More information about the Bogofilter mailing list