Delete headers before training

RW rwmaillists at googlemail.com
Wed Sep 29 20:38:58 CEST 2010


On Wed, 29 Sep 2010 20:58:28 +0800
Denny Lin <dennylin93 at hs.ntnu.edu.tw> wrote:

> Hi,
> 
> I have a corpus of 10,000 ham and 10,000 spam right now. While running
> some tests, I discovered that the headers SpamAssassin added
> (X-Spam-*) affects the accurary of bogofilter.
> 
> Is there a script I can use to delete the headers before training?
> Thanks.
> 

I pipe though this bit of awk 

/^[^[:space:]]/  {want=1}
/^$/ {body=1}
!body && /^(X-Bogo|X-DSPAM-|X-Spam-)/ {want=0}
body || want {print}



More information about the Bogofilter mailing list