Filter breakers
Stephen Davies
scldad at sdc.com.au
Sat Apr 5 03:46:27 CEST 2008
Thanks Tom. Looks interesting.
I use sendmail, milter and amavisd to invoke bogofilter so I'll have to think
about how your code could be included but it certainly looks as if it could
help.
Cheers,
Stephen
On Saturday 05 April 2008 01:03, Tom Anderson wrote:
> Stephen,
>
> I wrote a prefilter to handle exactly this kind of problem. The source
> is available here: http://orderamidchaos.com/bogofilter/spamitarium
>
> I've been using it for 3 or 4 years now, and it works wonderfully for
> helping to classify spams in which the headers play an outsized role.
>
> If you use it, I would appreciate feedback.
>
> Tom
>
> http://www.linkedin.com/in/orderamidchaos
>
> Stephen Davies wrote:
> > I am still getting too many "obvious" spams slipping through my
> > bogofilter setup.
> >
> > The more I investigate, the more it seems that quite innocuous headers
> > are at least part of my problem.
> >
> > The following bogoutil output is quite common. The obviously spam
> > components are outweighed by quite harmless header tokens - one of the
> > most commonly appearing being the current month header (head:Apr).
> >
> > Is there any way to push such header tokens out of the picture?
> > (In the example below for example, the to:anonymous token is ignored even
> > though the word counts are quite skewed: 23351 to 388.)
> >
> > My database is some 200Mb with 3.5 million tokens.
> >
> > TIA,
> > Stephen Davies
> >
> > X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000, version=1.1.5
> > n pgood pbad fw U
> > "head:X-KMail-EncryptionState" 262 0.014395 0.000042 0.002937 +
> > "head:X-KMail-MDN-Sent" 262 0.014395 0.000042 0.002937 +
> > "head:X-KMail-SignatureState" 262 0.014395 0.000042 0.002937 +
> > "head:X-Status" 262 0.014395 0.000042 0.002937 +
> > "head:ASHT" 1 0.000057 0.000000 0.009094 +
> > "head:cookie" 1 0.000057 0.000000 0.009094 +
> > "rcvd:c12.groups.msn.com" 1 0.000057 0.000000 0.009094 +
> > "head:Server" 84 0.003957 0.000057 0.014339 +
> > "head:http" 3782 0.173596 0.002875 0.016297 +
> > "head:Status" 550 0.016631 0.000990 0.056210 +
> > "head:Mail" 652 0.018294 0.001268 0.064843 +
> > "head:Performance" 2 0.000057 0.000004 0.066313 +
> > "head:X-Server" 2 0.000057 0.000004 0.066313 +
> > "head:surgemail.com" 2 0.000057 0.000004 0.066313 +
> > "rcvd:SMTPSVC" 3950 0.096519 0.008634 0.082112 +
> > "rcvd:Microsoft" 3948 0.096404 0.008634 0.082201 +
> > "head:Apr" 161 0.003498 0.000381 0.098227 +
> > "head:us-ascii" 11023 0.164994 0.031025 0.158275 -
> > "head:X-User" 4 0.000057 0.000011 0.167700 -
> > "head:Content-Transfer-Encoding" 48462 0.595917 0.144997 0.195700
> > - "rcvd:with" 45772 0.555313 0.137448 0.198407 -
> > "head:charset" 49550 0.590010 0.149533 0.202197 -
> > "rcvd:mustang.sdc.com.au" 4193 0.048632 0.012740 0.207584 -
> > "head:bit" 45510 0.522338 0.138640 0.209751 -
> > "one" 51375 0.585938 0.156754 0.211062 -
> > "head:plain" 42478 0.466938 0.130772 0.218788 -
> > "rcvd:SMTP" 16245 0.176636 0.050140 0.221100 -
> > "head:text" 50496 0.528531 0.157219 0.229266 -
> > "head:From" 1965 0.019212 0.006208 0.244220 -
> > "rcvd:Fri" 10516 0.090153 0.034064 0.274230 -
> > "rcvd:from" 73690 0.621437 0.239385 0.278089 -
> > "head:Content-Type" 121137 0.920227 0.400249 0.303110 -
> > "url:85" 244 0.001835 0.000807 0.305556 -
> > "head:High" 348 0.002466 0.001162 0.320224 -
> > "rcvd:pickup" 498 0.003498 0.001664 0.322390 -
> > "rcvd:service" 501 0.003498 0.001676 0.323887 -
> > "are" 84056 0.556919 0.283150 0.337056 -
> > "rcvd:mail" 568 0.003670 0.001920 0.343399 -
> > "head:Fri" 385 0.002409 0.001306 0.351647 -
> > "the" 158589 0.889660 0.544919 0.379846 -
> > "and" 156941 0.832483 0.542439 0.394524 -
> > "head:Date" 122391 0.602397 0.426132 0.414312 -
> > "head:Message-ID" 108546 0.531800 0.378091 0.415534 -
> > "rcvd:Apr" 4258 0.020416 0.014861 0.421264 -
> > "head:X-Mailer" 96664 0.345013 0.345242 0.500165 -
> > "head:rootsquest.com" 0 0.000000 0.000000 0.520000 -
> > "rcvd:n0d915383632d4" 0 0.000000 0.000000 0.520000 -
> > "rtrn:eddy" 0 0.000000 0.000000 0.520000 -
> > "rtrn:rootsquest.com" 0 0.000000 0.000000 0.520000 -
> > "url:85.75.79" 0 0.000000 0.000000 0.520000 -
> > "url:
> >
> > SPAM-ADDRESS: 85.75.79.173
> >
> > http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=85.75.79.173&bl0=0
> >
> > " 0 0.000000 0.000000 0.520000 -
> > "Gate" 289 0.000688 0.001055 0.605202 -
> > "http" 196856 0.447095 0.720053 0.616934 -
> > "to:sdc.com.au" 240854 0.467626 0.886260 0.654604 -
> > "online" 17982 0.030338 0.066471 0.686623 -
> > "to:anonymous" 23739 0.022252 0.088935 0.799871 -
> > "trusted" 742 0.000688 0.002780 0.801579 -
> > "to:scldad" 117239 0.106211 0.439462 0.805358 -
> > "largest" 3764 0.001262 0.014252 0.918670 +
> > "head:eddy" 1 0.000000 0.000004 0.991605 +
> > "url:85.75" 2 0.000000 0.000008 0.995766 +
> > "from:rootsquest.com" 4 0.000000 0.000015 0.997873 +
> > "
> >
> > SPAM-ADDRESS: grandonliencasino.com
> >
> > http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=grandonliencasino.
> >com&bl0=0
> >
> > " 4 0.000000 0.000015 0.997873 +
> > "mostt" 4 0.000000 0.000015 0.997873 +
> > "from:eddy" 17 0.000000 0.000065 0.999498 +
> > "casino_bonus" 42 0.000000 0.000160 0.999797 +
> > "subj:casino_bonus" 42 0.000000 0.000160 0.999797 +
> > "from:Inman" 60 0.000000 0.000229 0.999858 +
> > "from:Candace" 79 0.000000 0.000301 0.999892 +
> > "casinos" 505 0.000000 0.001923 0.999983 +
> > "Golden" 658 0.000000 0.002506 0.999987 +
> > "casino" 2956 0.000000 0.011258 0.999997 +
> > N_P_Q_S_s_x_md 31 0.000000 0.000000 0.500000
> > 0.017800 0.520000 0.375000
>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
--
========================================================================
This email is for the person(s) identified above, and is confidential to
the sender and the person(s). No one else is authorised to use or
disseminate this email or its contents.
Stephen Davies Consulting Voice: 08-8177 1595
Adelaide, South Australia. Fax: 08-8177 0133
Computing & Network solutions. Mobile:0403 0405 83
More information about the Bogofilter
mailing list