Delete headers before training
Ed Blackman
ed at edgewood.to
Wed Sep 29 17:09:44 CEST 2010
On Wed, Sep 29, 2010 at 08:58:28PM +0800, Denny Lin wrote:
>I have a corpus of 10,000 ham and 10,000 spam right now. While running
>some tests, I discovered that the headers SpamAssassin added (X-Spam-*)
>affects the accurary of bogofilter.
>
>Is there a script I can use to delete the headers before training?
Piping a message through "formail -I X-Spam" removes all X-Spam header
lines. It's unlikely that those kinds of headers would have
continuations, but formail will correctly handle them if so.
formail is part of procmail, so it's fairly common for it to already
be installed on Unix systems.
Ed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.txt
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20100929/2af3801d/attachment.sig>
More information about the Bogofilter
mailing list