Filter breakers

Tom Anderson tanderso at oac-design.com
Fri Apr 4 17:33:14 CEST 2008


Stephen,

I wrote a prefilter to handle exactly this kind of problem.  The source 
is available here: http://orderamidchaos.com/bogofilter/spamitarium

I've been using it for 3 or 4 years now, and it works wonderfully for 
helping to classify spams in which the headers play an outsized role.

If you use it, I would appreciate feedback.

Tom

http://www.linkedin.com/in/orderamidchaos


Stephen Davies wrote:
> I am still getting too many "obvious" spams slipping through my bogofilter 
> setup.
> 
> The more I investigate, the more it seems that quite innocuous headers are at 
> least part of my problem.
> 
> The following bogoutil output is quite common. The obviously spam components 
> are outweighed by quite harmless header tokens - one of the most commonly 
> appearing being the current month header (head:Apr).
> 
> Is there any way to push such header tokens out of the picture?
> (In the example below for example, the to:anonymous token is ignored even 
> though the word counts are quite skewed: 23351 to 388.)
> 
> My database is some 200Mb with 3.5 million tokens.
> 
> TIA,
> Stephen Davies
> 
> X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000, version=1.1.5
>                                         n    pgood     pbad      fw     U
>   "head:X-KMail-EncryptionState"      262  0.014395  0.000042  0.002937 +
>   "head:X-KMail-MDN-Sent"             262  0.014395  0.000042  0.002937 +
>   "head:X-KMail-SignatureState"       262  0.014395  0.000042  0.002937 +
>   "head:X-Status"                     262  0.014395  0.000042  0.002937 +
>   "head:ASHT"                           1  0.000057  0.000000  0.009094 +
>   "head:cookie"                         1  0.000057  0.000000  0.009094 +
>   "rcvd:c12.groups.msn.com"             1  0.000057  0.000000  0.009094 +
>   "head:Server"                        84  0.003957  0.000057  0.014339 +
>   "head:http"                        3782  0.173596  0.002875  0.016297 +
>   "head:Status"                       550  0.016631  0.000990  0.056210 +
>   "head:Mail"                         652  0.018294  0.001268  0.064843 +
>   "head:Performance"                    2  0.000057  0.000004  0.066313 +
>   "head:X-Server"                       2  0.000057  0.000004  0.066313 +
>   "head:surgemail.com"                  2  0.000057  0.000004  0.066313 +
>   "rcvd:SMTPSVC"                     3950  0.096519  0.008634  0.082112 +
>   "rcvd:Microsoft"                   3948  0.096404  0.008634  0.082201 +
>   "head:Apr"                          161  0.003498  0.000381  0.098227 +
>   "head:us-ascii"                   11023  0.164994  0.031025  0.158275 -
>   "head:X-User"                         4  0.000057  0.000011  0.167700 -
>   "head:Content-Transfer-Encoding"   48462  0.595917  0.144997  0.195700 -
>   "rcvd:with"                       45772  0.555313  0.137448  0.198407 -
>   "head:charset"                    49550  0.590010  0.149533  0.202197 -
>   "rcvd:mustang.sdc.com.au"          4193  0.048632  0.012740  0.207584 -
>   "head:bit"                        45510  0.522338  0.138640  0.209751 -
>   "one"                             51375  0.585938  0.156754  0.211062 -
>   "head:plain"                      42478  0.466938  0.130772  0.218788 -
>   "rcvd:SMTP"                       16245  0.176636  0.050140  0.221100 -
>   "head:text"                       50496  0.528531  0.157219  0.229266 -
>   "head:From"                        1965  0.019212  0.006208  0.244220 -
>   "rcvd:Fri"                        10516  0.090153  0.034064  0.274230 -
>   "rcvd:from"                       73690  0.621437  0.239385  0.278089 -
>   "head:Content-Type"              121137  0.920227  0.400249  0.303110 -
>   "url:85"                            244  0.001835  0.000807  0.305556 -
>   "head:High"                         348  0.002466  0.001162  0.320224 -
>   "rcvd:pickup"                       498  0.003498  0.001664  0.322390 -
>   "rcvd:service"                      501  0.003498  0.001676  0.323887 -
>   "are"                             84056  0.556919  0.283150  0.337056 -
>   "rcvd:mail"                         568  0.003670  0.001920  0.343399 -
>   "head:Fri"                          385  0.002409  0.001306  0.351647 -
>   "the"                            158589  0.889660  0.544919  0.379846 -
>   "and"                            156941  0.832483  0.542439  0.394524 -
>   "head:Date"                      122391  0.602397  0.426132  0.414312 -
>   "head:Message-ID"                108546  0.531800  0.378091  0.415534 -
>   "rcvd:Apr"                         4258  0.020416  0.014861  0.421264 -
>   "head:X-Mailer"                   96664  0.345013  0.345242  0.500165 -
>   "head:rootsquest.com"                 0  0.000000  0.000000  0.520000 -
>   "rcvd:n0d915383632d4"                 0  0.000000  0.000000  0.520000 -
>   "rtrn:eddy"                           0  0.000000  0.000000  0.520000 -
>   "rtrn:rootsquest.com"                 0  0.000000  0.000000  0.520000 -
>   "url:85.75.79"                        0  0.000000  0.000000  0.520000 -
>   "url:
>  
>  SPAM-ADDRESS: 85.75.79.173 
>  http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=85.75.79.173&bl0=0 
>  
>   "                    0  0.000000  0.000000  0.520000 -
>   "Gate"                              289  0.000688  0.001055  0.605202 -
>   "http"                           196856  0.447095  0.720053  0.616934 -
>   "to:sdc.com.au"                  240854  0.467626  0.886260  0.654604 -
>   "online"                          17982  0.030338  0.066471  0.686623 -
>   "to:anonymous"                    23739  0.022252  0.088935  0.799871 -
>   "trusted"                           742  0.000688  0.002780  0.801579 -
>   "to:scldad"                      117239  0.106211  0.439462  0.805358 -
>   "largest"                          3764  0.001262  0.014252  0.918670 +
>   "head:eddy"                           1  0.000000  0.000004  0.991605 +
>   "url:85.75"                           2  0.000000  0.000008  0.995766 +
>   "from:rootsquest.com"                 4  0.000000  0.000015  0.997873 +
>   "
>  
>  SPAM-ADDRESS: grandonliencasino.com 
>  http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=grandonliencasino.com&bl0=0 
>  
>   "               4  0.000000  0.000015  0.997873 +
>   "mostt"                               4  0.000000  0.000015  0.997873 +
>   "from:eddy"                          17  0.000000  0.000065  0.999498 +
>   "casino_bonus"                       42  0.000000  0.000160  0.999797 +
>   "subj:casino_bonus"                  42  0.000000  0.000160  0.999797 +
>   "from:Inman"                         60  0.000000  0.000229  0.999858 +
>   "from:Candace"                       79  0.000000  0.000301  0.999892 +
>   "casinos"                           505  0.000000  0.001923  0.999983 +
>   "Golden"                            658  0.000000  0.002506  0.999987 +
>   "casino"                           2956  0.000000  0.011258  0.999997 +
>   N_P_Q_S_s_x_md                       31  0.000000  0.000000  0.500000
>                                            0.017800  0.520000  0.375000
> 
> 



More information about the Bogofilter mailing list