Filter breakers
Stephen Davies
scldad at sdc.com.au
Fri Apr 4 04:31:28 CEST 2008
I am still getting too many "obvious" spams slipping through my bogofilter
setup.
The more I investigate, the more it seems that quite innocuous headers are at
least part of my problem.
The following bogoutil output is quite common. The obviously spam components
are outweighed by quite harmless header tokens - one of the most commonly
appearing being the current month header (head:Apr).
Is there any way to push such header tokens out of the picture?
(In the example below for example, the to:anonymous token is ignored even
though the word counts are quite skewed: 23351 to 388.)
My database is some 200Mb with 3.5 million tokens.
TIA,
Stephen Davies
X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000, version=1.1.5
n pgood pbad fw U
"head:X-KMail-EncryptionState" 262 0.014395 0.000042 0.002937 +
"head:X-KMail-MDN-Sent" 262 0.014395 0.000042 0.002937 +
"head:X-KMail-SignatureState" 262 0.014395 0.000042 0.002937 +
"head:X-Status" 262 0.014395 0.000042 0.002937 +
"head:ASHT" 1 0.000057 0.000000 0.009094 +
"head:cookie" 1 0.000057 0.000000 0.009094 +
"rcvd:c12.groups.msn.com" 1 0.000057 0.000000 0.009094 +
"head:Server" 84 0.003957 0.000057 0.014339 +
"head:http" 3782 0.173596 0.002875 0.016297 +
"head:Status" 550 0.016631 0.000990 0.056210 +
"head:Mail" 652 0.018294 0.001268 0.064843 +
"head:Performance" 2 0.000057 0.000004 0.066313 +
"head:X-Server" 2 0.000057 0.000004 0.066313 +
"head:surgemail.com" 2 0.000057 0.000004 0.066313 +
"rcvd:SMTPSVC" 3950 0.096519 0.008634 0.082112 +
"rcvd:Microsoft" 3948 0.096404 0.008634 0.082201 +
"head:Apr" 161 0.003498 0.000381 0.098227 +
"head:us-ascii" 11023 0.164994 0.031025 0.158275 -
"head:X-User" 4 0.000057 0.000011 0.167700 -
"head:Content-Transfer-Encoding" 48462 0.595917 0.144997 0.195700 -
"rcvd:with" 45772 0.555313 0.137448 0.198407 -
"head:charset" 49550 0.590010 0.149533 0.202197 -
"rcvd:mustang.sdc.com.au" 4193 0.048632 0.012740 0.207584 -
"head:bit" 45510 0.522338 0.138640 0.209751 -
"one" 51375 0.585938 0.156754 0.211062 -
"head:plain" 42478 0.466938 0.130772 0.218788 -
"rcvd:SMTP" 16245 0.176636 0.050140 0.221100 -
"head:text" 50496 0.528531 0.157219 0.229266 -
"head:From" 1965 0.019212 0.006208 0.244220 -
"rcvd:Fri" 10516 0.090153 0.034064 0.274230 -
"rcvd:from" 73690 0.621437 0.239385 0.278089 -
"head:Content-Type" 121137 0.920227 0.400249 0.303110 -
"url:85" 244 0.001835 0.000807 0.305556 -
"head:High" 348 0.002466 0.001162 0.320224 -
"rcvd:pickup" 498 0.003498 0.001664 0.322390 -
"rcvd:service" 501 0.003498 0.001676 0.323887 -
"are" 84056 0.556919 0.283150 0.337056 -
"rcvd:mail" 568 0.003670 0.001920 0.343399 -
"head:Fri" 385 0.002409 0.001306 0.351647 -
"the" 158589 0.889660 0.544919 0.379846 -
"and" 156941 0.832483 0.542439 0.394524 -
"head:Date" 122391 0.602397 0.426132 0.414312 -
"head:Message-ID" 108546 0.531800 0.378091 0.415534 -
"rcvd:Apr" 4258 0.020416 0.014861 0.421264 -
"head:X-Mailer" 96664 0.345013 0.345242 0.500165 -
"head:rootsquest.com" 0 0.000000 0.000000 0.520000 -
"rcvd:n0d915383632d4" 0 0.000000 0.000000 0.520000 -
"rtrn:eddy" 0 0.000000 0.000000 0.520000 -
"rtrn:rootsquest.com" 0 0.000000 0.000000 0.520000 -
"url:85.75.79" 0 0.000000 0.000000 0.520000 -
"url:85.75.79.173" 0 0.000000 0.000000 0.520000 -
"Gate" 289 0.000688 0.001055 0.605202 -
"http" 196856 0.447095 0.720053 0.616934 -
"to:sdc.com.au" 240854 0.467626 0.886260 0.654604 -
"online" 17982 0.030338 0.066471 0.686623 -
"to:anonymous" 23739 0.022252 0.088935 0.799871 -
"trusted" 742 0.000688 0.002780 0.801579 -
"to:scldad" 117239 0.106211 0.439462 0.805358 -
"largest" 3764 0.001262 0.014252 0.918670 +
"head:eddy" 1 0.000000 0.000004 0.991605 +
"url:85.75" 2 0.000000 0.000008 0.995766 +
"from:rootsquest.com" 4 0.000000 0.000015 0.997873 +
"grandonliencasino.com" 4 0.000000 0.000015 0.997873 +
"mostt" 4 0.000000 0.000015 0.997873 +
"from:eddy" 17 0.000000 0.000065 0.999498 +
"casino_bonus" 42 0.000000 0.000160 0.999797 +
"subj:casino_bonus" 42 0.000000 0.000160 0.999797 +
"from:Inman" 60 0.000000 0.000229 0.999858 +
"from:Candace" 79 0.000000 0.000301 0.999892 +
"casinos" 505 0.000000 0.001923 0.999983 +
"Golden" 658 0.000000 0.002506 0.999987 +
"casino" 2956 0.000000 0.011258 0.999997 +
N_P_Q_S_s_x_md 31 0.000000 0.000000 0.500000
0.017800 0.520000 0.375000
--
========================================================================
This email is for the person(s) identified above, and is confidential to
the sender and the person(s). No one else is authorised to use or
disseminate this email or its contents.
Stephen Davies Consulting Voice: 08-8177 1595
Adelaide, South Australia. Fax: 08-8177 0133
Computing & Network solutions. Mobile:0403 0405 83
More information about the Bogofilter
mailing list