Reducing wordlist size by ignoring DKIM headers

RW rwmaillists at googlemail.com
Sat Apr 10 13:48:10 CEST 2021


On Sat, 10 Apr 2021 11:41:21 +0200
Tomaž Šolc via bogofilter wrote:

> Hi,
> 
> I recently did some experiments in an attempt to reduce the bogofilter
> wordlist size by ignoring DKIM signatures. I found that patching the
> lexer to discard DKIM-related message headers reduces the wordlist
> size by 10% after training without affecting the false classification
> rate. I'm sharing my findings in case anyone else here finds them
> useful.


In the past when people have asked about ignoring certain headers they
have been told to strip them before passing the email to bogofilter.

Most statistical spam filters do support ignoring user specified
headers. I'd like to see BF support this too. Many of us simply pipe
emails through BF and then use the x-bogosity header. And personally I
don't want any header to be *permanently* stripped by a separate filter.

DKIM headers are the tip of the iceberg. Many mail systems add huge
headers with encrypted/obfuscated data encoded in a similar way.



More information about the bogofilter mailing list