Reducing the size of the training files

Shawn Barnhart swb at grasslake.net
Wed Apr 16 16:04:32 CEST 2003


----- Original Message -----
From: "Boris 'pi' Piwinger" <3.14 at logic.univie.ac.at>

> My collection of mails for training is growing, now I have
> about 68/53 megs of ham/spam respctively (20k+/8k+ mails).
>
> I observe that I get really huge spam messages in the last
> time (more then 300k) regularly. Since this is mostly due to
> image or similar attachments, this is of no use for the
> training, but I don't want to delete the mails of course. So
> the idea is to cut down the attachments. Does someone have a
> script to do this?

I've been getting a number of attachments in the 200-300k range, the English
language versions claiming to be some kind of internet security patch.  It's
actually a virus (W32.Gibe at mm).

It'd be nice if bogofilter *could* use attachments for the training process,
or at least the strings contained in the attachment.

I know there's processing overhead, but perhaps it could at least be an
option.





More information about the Bogofilter mailing list