relson at osagesoftware.com
Thu Jul 29 10:52:13 EDT 2004
On Thu, 29 Jul 2004 01:23:26 -0700
Chris Fortune wrote:
> I seem to remember that bogofilter only looks at the first x number of
> characters (or was it lines?) in an email. Is this true of both
> registration and classification? I'm thinking of a simple
> optimization of truncating mail in the training corpus to save space,
> cpu cycles and bandwidth.
Your memory is faulty:-(
Bogofilter looks at the whole message, typically presented on stdin. It
processes both header and body, including multi-part mime sections.
Attachments like tgz files and images are assumed to be binary and are
If space and time are considerations, you could "head" to trim the
message. Remember that giving bogofilter less text to work with will
lower its accuracy.
More information about the Bogofilter