Filter breakers
Tom Anderson
tanderso at oac-design.com
Fri Apr 4 17:33:14 CEST 2008
Stephen,
I wrote a prefilter to handle exactly this kind of problem. The source
is available here: http://orderamidchaos.com/bogofilter/spamitarium
I've been using it for 3 or 4 years now, and it works wonderfully for
helping to classify spams in which the headers play an outsized role.
If you use it, I would appreciate feedback.
Tom
http://www.linkedin.com/in/orderamidchaos
Stephen Davies wrote:
> I am still getting too many "obvious" spams slipping through my bogofilter
> setup.
>
> The more I investigate, the more it seems that quite innocuous headers are at
> least part of my problem.
>
> The following bogoutil output is quite common. The obviously spam components
> are outweighed by quite harmless header tokens - one of the most commonly
> appearing being the current month header (head:Apr).
>
> Is there any way to push such header tokens out of the picture?
> (In the example below for example, the to:anonymous token is ignored even
> though the word counts are quite skewed: 23351 to 388.)
>
> My database is some 200Mb with 3.5 million tokens.
>
> TIA,
> Stephen Davies
>
> X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000, version=1.1.5
> n pgood pbad fw U
> "head:X-KMail-EncryptionState" 262 0.014395 0.000042 0.002937 +
> "head:X-KMail-MDN-Sent" 262 0.014395 0.000042 0.002937 +
> "head:X-KMail-SignatureState" 262 0.014395 0.000042 0.002937 +
> "head:X-Status" 262 0.014395 0.000042 0.002937 +
> "head:ASHT" 1 0.000057 0.000000 0.009094 +
> "head:cookie" 1 0.000057 0.000000 0.009094 +
> "rcvd:c12.groups.msn.com" 1 0.000057 0.000000 0.009094 +
> "head:Server" 84 0.003957 0.000057 0.014339 +
> "head:http" 3782 0.173596 0.002875 0.016297 +
> "head:Status" 550 0.016631 0.000990 0.056210 +
> "head:Mail" 652 0.018294 0.001268 0.064843 +
> "head:Performance" 2 0.000057 0.000004 0.066313 +
> "head:X-Server" 2 0.000057 0.000004 0.066313 +
> "head:surgemail.com" 2 0.000057 0.000004 0.066313 +
> "rcvd:SMTPSVC" 3950 0.096519 0.008634 0.082112 +
> "rcvd:Microsoft" 3948 0.096404 0.008634 0.082201 +
> "head:Apr" 161 0.003498 0.000381 0.098227 +
> "head:us-ascii" 11023 0.164994 0.031025 0.158275 -
> "head:X-User" 4 0.000057 0.000011 0.167700 -
> "head:Content-Transfer-Encoding" 48462 0.595917 0.144997 0.195700 -
> "rcvd:with" 45772 0.555313 0.137448 0.198407 -
> "head:charset" 49550 0.590010 0.149533 0.202197 -
> "rcvd:mustang.sdc.com.au" 4193 0.048632 0.012740 0.207584 -
> "head:bit" 45510 0.522338 0.138640 0.209751 -
> "one" 51375 0.585938 0.156754 0.211062 -
> "head:plain" 42478 0.466938 0.130772 0.218788 -
> "rcvd:SMTP" 16245 0.176636 0.050140 0.221100 -
> "head:text" 50496 0.528531 0.157219 0.229266 -
> "head:From" 1965 0.019212 0.006208 0.244220 -
> "rcvd:Fri" 10516 0.090153 0.034064 0.274230 -
> "rcvd:from" 73690 0.621437 0.239385 0.278089 -
> "head:Content-Type" 121137 0.920227 0.400249 0.303110 -
> "url:85" 244 0.001835 0.000807 0.305556 -
> "head:High" 348 0.002466 0.001162 0.320224 -
> "rcvd:pickup" 498 0.003498 0.001664 0.322390 -
> "rcvd:service" 501 0.003498 0.001676 0.323887 -
> "are" 84056 0.556919 0.283150 0.337056 -
> "rcvd:mail" 568 0.003670 0.001920 0.343399 -
> "head:Fri" 385 0.002409 0.001306 0.351647 -
> "the" 158589 0.889660 0.544919 0.379846 -
> "and" 156941 0.832483 0.542439 0.394524 -
> "head:Date" 122391 0.602397 0.426132 0.414312 -
> "head:Message-ID" 108546 0.531800 0.378091 0.415534 -
> "rcvd:Apr" 4258 0.020416 0.014861 0.421264 -
> "head:X-Mailer" 96664 0.345013 0.345242 0.500165 -
> "head:rootsquest.com" 0 0.000000 0.000000 0.520000 -
> "rcvd:n0d915383632d4" 0 0.000000 0.000000 0.520000 -
> "rtrn:eddy" 0 0.000000 0.000000 0.520000 -
> "rtrn:rootsquest.com" 0 0.000000 0.000000 0.520000 -
> "url:85.75.79" 0 0.000000 0.000000 0.520000 -
> "url:
>
> SPAM-ADDRESS: 85.75.79.173
> http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=85.75.79.173&bl0=0
>
> " 0 0.000000 0.000000 0.520000 -
> "Gate" 289 0.000688 0.001055 0.605202 -
> "http" 196856 0.447095 0.720053 0.616934 -
> "to:sdc.com.au" 240854 0.467626 0.886260 0.654604 -
> "online" 17982 0.030338 0.066471 0.686623 -
> "to:anonymous" 23739 0.022252 0.088935 0.799871 -
> "trusted" 742 0.000688 0.002780 0.801579 -
> "to:scldad" 117239 0.106211 0.439462 0.805358 -
> "largest" 3764 0.001262 0.014252 0.918670 +
> "head:eddy" 1 0.000000 0.000004 0.991605 +
> "url:85.75" 2 0.000000 0.000008 0.995766 +
> "from:rootsquest.com" 4 0.000000 0.000015 0.997873 +
> "
>
> SPAM-ADDRESS: grandonliencasino.com
> http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=grandonliencasino.com&bl0=0
>
> " 4 0.000000 0.000015 0.997873 +
> "mostt" 4 0.000000 0.000015 0.997873 +
> "from:eddy" 17 0.000000 0.000065 0.999498 +
> "casino_bonus" 42 0.000000 0.000160 0.999797 +
> "subj:casino_bonus" 42 0.000000 0.000160 0.999797 +
> "from:Inman" 60 0.000000 0.000229 0.999858 +
> "from:Candace" 79 0.000000 0.000301 0.999892 +
> "casinos" 505 0.000000 0.001923 0.999983 +
> "Golden" 658 0.000000 0.002506 0.999987 +
> "casino" 2956 0.000000 0.011258 0.999997 +
> N_P_Q_S_s_x_md 31 0.000000 0.000000 0.500000
> 0.017800 0.520000 0.375000
>
>
More information about the Bogofilter
mailing list