Filter breakers

Stephen Davies scldad at sdc.com.au
Fri Apr 4 04:31:28 CEST 2008


I am still getting too many "obvious" spams slipping through my bogofilter 
setup.

The more I investigate, the more it seems that quite innocuous headers are at 
least part of my problem.

The following bogoutil output is quite common. The obviously spam components 
are outweighed by quite harmless header tokens - one of the most commonly 
appearing being the current month header (head:Apr).

Is there any way to push such header tokens out of the picture?
(In the example below for example, the to:anonymous token is ignored even 
though the word counts are quite skewed: 23351 to 388.)

My database is some 200Mb with 3.5 million tokens.

TIA,
Stephen Davies

X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000, version=1.1.5
                                        n    pgood     pbad      fw     U
  "head:X-KMail-EncryptionState"      262  0.014395  0.000042  0.002937 +
  "head:X-KMail-MDN-Sent"             262  0.014395  0.000042  0.002937 +
  "head:X-KMail-SignatureState"       262  0.014395  0.000042  0.002937 +
  "head:X-Status"                     262  0.014395  0.000042  0.002937 +
  "head:ASHT"                           1  0.000057  0.000000  0.009094 +
  "head:cookie"                         1  0.000057  0.000000  0.009094 +
  "rcvd:c12.groups.msn.com"             1  0.000057  0.000000  0.009094 +
  "head:Server"                        84  0.003957  0.000057  0.014339 +
  "head:http"                        3782  0.173596  0.002875  0.016297 +
  "head:Status"                       550  0.016631  0.000990  0.056210 +
  "head:Mail"                         652  0.018294  0.001268  0.064843 +
  "head:Performance"                    2  0.000057  0.000004  0.066313 +
  "head:X-Server"                       2  0.000057  0.000004  0.066313 +
  "head:surgemail.com"                  2  0.000057  0.000004  0.066313 +
  "rcvd:SMTPSVC"                     3950  0.096519  0.008634  0.082112 +
  "rcvd:Microsoft"                   3948  0.096404  0.008634  0.082201 +
  "head:Apr"                          161  0.003498  0.000381  0.098227 +
  "head:us-ascii"                   11023  0.164994  0.031025  0.158275 -
  "head:X-User"                         4  0.000057  0.000011  0.167700 -
  "head:Content-Transfer-Encoding"   48462  0.595917  0.144997  0.195700 -
  "rcvd:with"                       45772  0.555313  0.137448  0.198407 -
  "head:charset"                    49550  0.590010  0.149533  0.202197 -
  "rcvd:mustang.sdc.com.au"          4193  0.048632  0.012740  0.207584 -
  "head:bit"                        45510  0.522338  0.138640  0.209751 -
  "one"                             51375  0.585938  0.156754  0.211062 -
  "head:plain"                      42478  0.466938  0.130772  0.218788 -
  "rcvd:SMTP"                       16245  0.176636  0.050140  0.221100 -
  "head:text"                       50496  0.528531  0.157219  0.229266 -
  "head:From"                        1965  0.019212  0.006208  0.244220 -
  "rcvd:Fri"                        10516  0.090153  0.034064  0.274230 -
  "rcvd:from"                       73690  0.621437  0.239385  0.278089 -
  "head:Content-Type"              121137  0.920227  0.400249  0.303110 -
  "url:85"                            244  0.001835  0.000807  0.305556 -
  "head:High"                         348  0.002466  0.001162  0.320224 -
  "rcvd:pickup"                       498  0.003498  0.001664  0.322390 -
  "rcvd:service"                      501  0.003498  0.001676  0.323887 -
  "are"                             84056  0.556919  0.283150  0.337056 -
  "rcvd:mail"                         568  0.003670  0.001920  0.343399 -
  "head:Fri"                          385  0.002409  0.001306  0.351647 -
  "the"                            158589  0.889660  0.544919  0.379846 -
  "and"                            156941  0.832483  0.542439  0.394524 -
  "head:Date"                      122391  0.602397  0.426132  0.414312 -
  "head:Message-ID"                108546  0.531800  0.378091  0.415534 -
  "rcvd:Apr"                         4258  0.020416  0.014861  0.421264 -
  "head:X-Mailer"                   96664  0.345013  0.345242  0.500165 -
  "head:rootsquest.com"                 0  0.000000  0.000000  0.520000 -
  "rcvd:n0d915383632d4"                 0  0.000000  0.000000  0.520000 -
  "rtrn:eddy"                           0  0.000000  0.000000  0.520000 -
  "rtrn:rootsquest.com"                 0  0.000000  0.000000  0.520000 -
  "url:85.75.79"                        0  0.000000  0.000000  0.520000 -
  "url:85.75.79.173"                    0  0.000000  0.000000  0.520000 -
  "Gate"                              289  0.000688  0.001055  0.605202 -
  "http"                           196856  0.447095  0.720053  0.616934 -
  "to:sdc.com.au"                  240854  0.467626  0.886260  0.654604 -
  "online"                          17982  0.030338  0.066471  0.686623 -
  "to:anonymous"                    23739  0.022252  0.088935  0.799871 -
  "trusted"                           742  0.000688  0.002780  0.801579 -
  "to:scldad"                      117239  0.106211  0.439462  0.805358 -
  "largest"                          3764  0.001262  0.014252  0.918670 +
  "head:eddy"                           1  0.000000  0.000004  0.991605 +
  "url:85.75"                           2  0.000000  0.000008  0.995766 +
  "from:rootsquest.com"                 4  0.000000  0.000015  0.997873 +
  "grandonliencasino.com"               4  0.000000  0.000015  0.997873 +
  "mostt"                               4  0.000000  0.000015  0.997873 +
  "from:eddy"                          17  0.000000  0.000065  0.999498 +
  "casino_bonus"                       42  0.000000  0.000160  0.999797 +
  "subj:casino_bonus"                  42  0.000000  0.000160  0.999797 +
  "from:Inman"                         60  0.000000  0.000229  0.999858 +
  "from:Candace"                       79  0.000000  0.000301  0.999892 +
  "casinos"                           505  0.000000  0.001923  0.999983 +
  "Golden"                            658  0.000000  0.002506  0.999987 +
  "casino"                           2956  0.000000  0.011258  0.999997 +
  N_P_Q_S_s_x_md                       31  0.000000  0.000000  0.500000
                                           0.017800  0.520000  0.375000


-- 
========================================================================
This email is for the person(s) identified above, and is confidential to
the sender and the person(s).  No one else is authorised to use or
disseminate this email or its contents.

Stephen Davies Consulting                            Voice: 08-8177 1595
Adelaide, South Australia.                             Fax: 08-8177 0133
Computing & Network solutions.                       Mobile:0403 0405 83



More information about the Bogofilter mailing list