Delete headers before training

Ed Blackman ed at edgewood.to
Wed Sep 29 17:09:44 CEST 2010


On Wed, Sep 29, 2010 at 08:58:28PM +0800, Denny Lin wrote:
>I have a corpus of 10,000 ham and 10,000 spam right now. While running
>some tests, I discovered that the headers SpamAssassin added (X-Spam-*)
>affects the accurary of bogofilter.
>
>Is there a script I can use to delete the headers before training?

Piping a message through "formail -I X-Spam" removes all X-Spam header 
lines.  It's unlikely that those kinds of headers would have 
continuations, but formail will correctly handle them if so.

formail is part of procmail, so it's fairly common for it to already 
be installed on Unix systems.

Ed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.txt
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20100929/2af3801d/attachment.sig>


More information about the Bogofilter mailing list