truncating input?

David Relson relson at
Thu Jul 29 10:52:13 EDT 2004

On Thu, 29 Jul 2004 01:23:26 -0700
Chris Fortune wrote:

> I seem to remember that bogofilter only looks at the first x number of
> characters (or was it lines?) in an email.   Is this true of both
> registration and classification?   I'm thinking of a simple
> optimization of truncating mail in the training corpus to save space,
> cpu cycles and  bandwidth.

Hi Chris,

Your memory is faulty:-(

Bogofilter looks at the whole message, typically presented on stdin.  It
processes both header and body, including multi-part mime sections. 
Attachments like tgz files and images are assumed to be binary and are

If space and time are considerations, you could "head" to trim the
message.  Remember that giving bogofilter less text to work with will
lower its accuracy.



More information about the Bogofilter mailing list