Delete headers before training
RW
rwmaillists at googlemail.com
Wed Sep 29 20:38:58 CEST 2010
On Wed, 29 Sep 2010 20:58:28 +0800
Denny Lin <dennylin93 at hs.ntnu.edu.tw> wrote:
> Hi,
>
> I have a corpus of 10,000 ham and 10,000 spam right now. While running
> some tests, I discovered that the headers SpamAssassin added
> (X-Spam-*) affects the accurary of bogofilter.
>
> Is there a script I can use to delete the headers before training?
> Thanks.
>
I pipe though this bit of awk
/^[^[:space:]]/ {want=1}
/^$/ {body=1}
!body && /^(X-Bogo|X-DSPAM-|X-Spam-)/ {want=0}
body || want {print}
More information about the Bogofilter
mailing list