Training bogofilter with Spamassassin collected spam.

Tom Anderson tanderso at oac-design.com
Mon Aug 9 13:00:55 CEST 2004


On Mon, 2004-08-09 at 06:50, Tom Anderson wrote:
> On Sun, 2004-08-08 at 23:56, Christian Dysthe wrote:
> > I have a large mbox with spam collected by Spamassassin. All this mail has  
> > been altered like Spamassassin does it: Putting a spam warning text in the  
> > body of the mail, and move the spam content to an attachment. Will it  
> > cause any problems using this mbox to train Bogofilter?
> 
> Only that the SA header will be considered spammy.  If you only train
> spam, but future hams also have a similar header, then you may get false
> positives.  If you're not going to be using SA concurrent with
> bogofilter in the future, then there shouldn't be any concern with those
> headers being spammy, as nobody would intentionally add spammy headers. 
> Alternatively, to remove any possible problems, you could run a bash
> script to strip out those headers before training.

Sorry, I misread your original message.  Not headers, the body, you're
saying...  Yeah, that'll cause a problem.  Bogofilter doesn't look at
attachment content.  You'll have to create (or find on the internet) a
Spamassassin reverser script that will strip out the MIME stuff and just
leave the original spam portion intact.

Tom





More information about the Bogofilter mailing list