Headers added by upstream spamassassin

Charles A. Hewson cahewson at eskimo.com
Sat Jun 15 19:15:40 CEST 2013


On Sat, 15 Jun 2013, RW wrote:

>
>
> On Fri, 14 Jun 2013 23:02:49 -0400
> David Relson wrote:
>> On Fri, 14 Jun 2013 09:38:11 -0700 (PDT)
>> Charles A. Hewson wrote:
>
>>> I have used bogofilters on individual account for years and it has
>>> worked very well. My ISP has inserted spamassassin with a site wide
>>> Bayes database. I am getting tokens like "head:h-##s-**d-....."
>>> added to to wordlist.db. Where ## is the number of times
>>> spamassassin saw the token in spam and ** is days since seen. Is
>>> their a way to add regular expressions to ignore.db? Can I tell
>>> bogofilter to ignore specific headers like "X-Spam-spammy"?
>>>
>>
>> The first solution that comes to mind is to filter out the undesired
>> header before passing the message to bogofilter, e.g.
>>
>> cat message | grep -v ^X-Spam-spammy | bogofilter
>>
>> or something similar.
>
> That wont work unless  multi-line headers have already been converted
> to single-line.
>
> The following bit of awk should do it
>
>                /^[^[:space:]]/   { remove = 0 }
>                /^X-Spam/         { remove = 1 }
>                /^$/              { isbody = 1 }
>                isbody || !remove { print }
>
> put it in a file  then pipe the mail through awk -f <path to script>.
>
> OTOH if your main concern isn't disk space, I'd leave them in and see
> what happens. Bogofilter may find the extra information useful. If it
> doesn't you can start stripping the headers - tokens that aren't
> seen don't contribute to classification.
>
> The tokens created from the Bayes-tokens aren't likely to have much
> affect if as you say they contains counts and ageing data. And you
> may find that the ISP eventually stops adding them.
>

The database has grown 27% in a short time and the same spam over three 
days generates the following:

shellx ~ $ bogoutil -d .bogofilter/wordlist.db |grep 201306 |grep head: \
| grep d--account
head:h-10s--0d--account 1 0 20130604
head:h-13s--0d--account 1 0 20130605
head:h-17s--0d--account 1 0 20130606

Some do not get to database because of thresh_update. Would
"head:account 3 0 20130606" compute differently?

charles

--
Charles Hewson <cahewson at eskimo.com>
Seattle, WA. U.S.A.


_______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
>



More information about the Bogofilter mailing list