Headers added by upstream spamassassin

RW rwmaillists at googlemail.com
Sat Jun 15 15:15:46 CEST 2013



On Fri, 14 Jun 2013 23:02:49 -0400
David Relson wrote:
> On Fri, 14 Jun 2013 09:38:11 -0700 (PDT)
> Charles A. Hewson wrote:

> > I have used bogofilters on individual account for years and it has
> > worked very well. My ISP has inserted spamassassin with a site wide
> > Bayes database. I am getting tokens like "head:h-##s-**d-....."
> > added to to wordlist.db. Where ## is the number of times
> > spamassassin saw the token in spam and ** is days since seen. Is
> > their a way to add regular expressions to ignore.db? Can I tell
> > bogofilter to ignore specific headers like "X-Spam-spammy"?
> > 
> 
> The first solution that comes to mind is to filter out the undesired
> header before passing the message to bogofilter, e.g.
> 
> cat message | grep -v ^X-Spam-spammy | bogofilter
> 
> or something similar.

That wont work unless  multi-line headers have already been converted
to single-line.

The following bit of awk should do it

                /^[^[:space:]]/   { remove = 0 }
                /^X-Spam/         { remove = 1 }
                /^$/              { isbody = 1 }
                isbody || !remove { print }

put it in a file  then pipe the mail through awk -f <path to script>.

OTOH if your main concern isn't disk space, I'd leave them in and see
what happens. Bogofilter may find the extra information useful. If it
doesn't you can start stripping the headers - tokens that aren't
seen don't contribute to classification.

The tokens created from the Bayes-tokens aren't likely to have much
affect if as you say they contains counts and ageing data. And you
may find that the ISP eventually stops adding them.   



More information about the Bogofilter mailing list