Training bogofilter with Spamassassin collected spam.
Tom Anderson
tanderso at oac-design.com
Mon Aug 9 13:00:55 CEST 2004
On Mon, 2004-08-09 at 06:50, Tom Anderson wrote:
> On Sun, 2004-08-08 at 23:56, Christian Dysthe wrote:
> > I have a large mbox with spam collected by Spamassassin. All this mail has
> > been altered like Spamassassin does it: Putting a spam warning text in the
> > body of the mail, and move the spam content to an attachment. Will it
> > cause any problems using this mbox to train Bogofilter?
>
> Only that the SA header will be considered spammy. If you only train
> spam, but future hams also have a similar header, then you may get false
> positives. If you're not going to be using SA concurrent with
> bogofilter in the future, then there shouldn't be any concern with those
> headers being spammy, as nobody would intentionally add spammy headers.
> Alternatively, to remove any possible problems, you could run a bash
> script to strip out those headers before training.
Sorry, I misread your original message. Not headers, the body, you're
saying... Yeah, that'll cause a problem. Bogofilter doesn't look at
attachment content. You'll have to create (or find on the internet) a
Spamassassin reverser script that will strip out the MIME stuff and just
leave the original spam portion intact.
Tom
More information about the Bogofilter
mailing list