corpus attachments question

Gyepi SAM gyepi at praxis-sw.com
Mon Sep 15 18:11:56 CEST 2003


On Mon, Sep 15, 2003 at 11:23:19AM -0400, Matt Garretson wrote:
> I'm interested in reducing the size (in MB) of my ham corpus by trimming
> many large binary MIME attachments from the ham messages.  From what i
> can tell, bogfilter skips non-text and non-html attachments, so my guess
> is that removing other attachments (e.g. image/jpeg, application/msword)
> should not change the tokens found in the message.  Is that correct?

Yes.
But bogofilter does parse the mime headers that accompany the attachments so
leave those in. Unfortunately, most tools remove mime components *will*
also remove the mime headers so you may have to write a script to solve
the problem.

-Gyepi




More information about the Bogofilter mailing list