corpus attachments question

Matt Garretson mattg at assembly.state.ny.us
Mon Sep 15 17:23:19 CEST 2003


Hi, sorry in advance for what may be a dumb question.

I'm interested in reducing the size (in MB) of my ham corpus by trimming
many large binary MIME attachments from the ham messages.  From what i
can tell, bogfilter skips non-text and non-html attachments, so my guess
is that removing other attachments (e.g. image/jpeg, application/msword)
should not change the tokens found in the message.  Is that correct?  At
least, it seems to be the case from a few before & after runs using the
-vvv option.

Thanks,
-Matt





More information about the Bogofilter mailing list