Filter breakers

Stephen Davies scldad at sdc.com.au
Sat Apr 5 03:46:27 CEST 2008


Thanks Tom. Looks interesting.

I use sendmail, milter and amavisd to invoke bogofilter so I'll have to think 
about how your code could be included but it certainly looks as if it could 
help.

Cheers,
Stephen

On Saturday 05 April 2008 01:03, Tom Anderson wrote:
> Stephen,
>
> I wrote a prefilter to handle exactly this kind of problem.  The source
> is available here: http://orderamidchaos.com/bogofilter/spamitarium
>
> I've been using it for 3 or 4 years now, and it works wonderfully for
> helping to classify spams in which the headers play an outsized role.
>
> If you use it, I would appreciate feedback.
>
> Tom
>
> http://www.linkedin.com/in/orderamidchaos
>
> Stephen Davies wrote:
> > I am still getting too many "obvious" spams slipping through my
> > bogofilter setup.
> >
> > The more I investigate, the more it seems that quite innocuous headers
> > are at least part of my problem.
> >
> > The following bogoutil output is quite common. The obviously spam
> > components are outweighed by quite harmless header tokens - one of the
> > most commonly appearing being the current month header (head:Apr).
> >
> > Is there any way to push such header tokens out of the picture?
> > (In the example below for example, the to:anonymous token is ignored even
> > though the word counts are quite skewed: 23351 to 388.)
> >
> > My database is some 200Mb with 3.5 million tokens.
> >
> > TIA,
> > Stephen Davies
> >
> > X-Bogosity: Ham, tests=bogofilter, spamicity=0.500000, version=1.1.5
> >                                         n    pgood     pbad      fw     U
> >   "head:X-KMail-EncryptionState"      262  0.014395  0.000042  0.002937 +
> >   "head:X-KMail-MDN-Sent"             262  0.014395  0.000042  0.002937 +
> >   "head:X-KMail-SignatureState"       262  0.014395  0.000042  0.002937 +
> >   "head:X-Status"                     262  0.014395  0.000042  0.002937 +
> >   "head:ASHT"                           1  0.000057  0.000000  0.009094 +
> >   "head:cookie"                         1  0.000057  0.000000  0.009094 +
> >   "rcvd:c12.groups.msn.com"             1  0.000057  0.000000  0.009094 +
> >   "head:Server"                        84  0.003957  0.000057  0.014339 +
> >   "head:http"                        3782  0.173596  0.002875  0.016297 +
> >   "head:Status"                       550  0.016631  0.000990  0.056210 +
> >   "head:Mail"                         652  0.018294  0.001268  0.064843 +
> >   "head:Performance"                    2  0.000057  0.000004  0.066313 +
> >   "head:X-Server"                       2  0.000057  0.000004  0.066313 +
> >   "head:surgemail.com"                  2  0.000057  0.000004  0.066313 +
> >   "rcvd:SMTPSVC"                     3950  0.096519  0.008634  0.082112 +
> >   "rcvd:Microsoft"                   3948  0.096404  0.008634  0.082201 +
> >   "head:Apr"                          161  0.003498  0.000381  0.098227 +
> >   "head:us-ascii"                   11023  0.164994  0.031025  0.158275 -
> >   "head:X-User"                         4  0.000057  0.000011  0.167700 -
> >   "head:Content-Transfer-Encoding"   48462  0.595917  0.144997  0.195700
> > - "rcvd:with"                       45772  0.555313  0.137448  0.198407 -
> > "head:charset"                    49550  0.590010  0.149533  0.202197 -
> > "rcvd:mustang.sdc.com.au"          4193  0.048632  0.012740  0.207584 -
> > "head:bit"                        45510  0.522338  0.138640  0.209751 -
> > "one"                             51375  0.585938  0.156754  0.211062 -
> > "head:plain"                      42478  0.466938  0.130772  0.218788 -
> > "rcvd:SMTP"                       16245  0.176636  0.050140  0.221100 -
> > "head:text"                       50496  0.528531  0.157219  0.229266 -
> > "head:From"                        1965  0.019212  0.006208  0.244220 -
> > "rcvd:Fri"                        10516  0.090153  0.034064  0.274230 -
> > "rcvd:from"                       73690  0.621437  0.239385  0.278089 -
> > "head:Content-Type"              121137  0.920227  0.400249  0.303110 -
> > "url:85"                            244  0.001835  0.000807  0.305556 -
> > "head:High"                         348  0.002466  0.001162  0.320224 -
> > "rcvd:pickup"                       498  0.003498  0.001664  0.322390 -
> > "rcvd:service"                      501  0.003498  0.001676  0.323887 -
> > "are"                             84056  0.556919  0.283150  0.337056 -
> > "rcvd:mail"                         568  0.003670  0.001920  0.343399 -
> > "head:Fri"                          385  0.002409  0.001306  0.351647 -
> > "the"                            158589  0.889660  0.544919  0.379846 -
> > "and"                            156941  0.832483  0.542439  0.394524 -
> > "head:Date"                      122391  0.602397  0.426132  0.414312 -
> > "head:Message-ID"                108546  0.531800  0.378091  0.415534 -
> > "rcvd:Apr"                         4258  0.020416  0.014861  0.421264 -
> > "head:X-Mailer"                   96664  0.345013  0.345242  0.500165 -
> > "head:rootsquest.com"                 0  0.000000  0.000000  0.520000 -
> > "rcvd:n0d915383632d4"                 0  0.000000  0.000000  0.520000 -
> > "rtrn:eddy"                           0  0.000000  0.000000  0.520000 -
> > "rtrn:rootsquest.com"                 0  0.000000  0.000000  0.520000 -
> > "url:85.75.79"                        0  0.000000  0.000000  0.520000 -
> > "url:
> >
> >  SPAM-ADDRESS: 85.75.79.173
> > 
> > http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=85.75.79.173&bl0=0
> >
> >   "                    0  0.000000  0.000000  0.520000 -
> >   "Gate"                              289  0.000688  0.001055  0.605202 -
> >   "http"                           196856  0.447095  0.720053  0.616934 -
> >   "to:sdc.com.au"                  240854  0.467626  0.886260  0.654604 -
> >   "online"                          17982  0.030338  0.066471  0.686623 -
> >   "to:anonymous"                    23739  0.022252  0.088935  0.799871 -
> >   "trusted"                           742  0.000688  0.002780  0.801579 -
> >   "to:scldad"                      117239  0.106211  0.439462  0.805358 -
> >   "largest"                          3764  0.001262  0.014252  0.918670 +
> >   "head:eddy"                           1  0.000000  0.000004  0.991605 +
> >   "url:85.75"                           2  0.000000  0.000008  0.995766 +
> >   "from:rootsquest.com"                 4  0.000000  0.000015  0.997873 +
> >   "
> >
> >  SPAM-ADDRESS: grandonliencasino.com
> > 
> > http://www.rulesemporium.com/cgi-bin/uribl.cgi?domain0=grandonliencasino.
> >com&bl0=0
> >
> >   "               4  0.000000  0.000015  0.997873 +
> >   "mostt"                               4  0.000000  0.000015  0.997873 +
> >   "from:eddy"                          17  0.000000  0.000065  0.999498 +
> >   "casino_bonus"                       42  0.000000  0.000160  0.999797 +
> >   "subj:casino_bonus"                  42  0.000000  0.000160  0.999797 +
> >   "from:Inman"                         60  0.000000  0.000229  0.999858 +
> >   "from:Candace"                       79  0.000000  0.000301  0.999892 +
> >   "casinos"                           505  0.000000  0.001923  0.999983 +
> >   "Golden"                            658  0.000000  0.002506  0.999987 +
> >   "casino"                           2956  0.000000  0.011258  0.999997 +
> >   N_P_Q_S_s_x_md                       31  0.000000  0.000000  0.500000
> >                                            0.017800  0.520000  0.375000
>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter

-- 
========================================================================
This email is for the person(s) identified above, and is confidential to
the sender and the person(s).  No one else is authorised to use or
disseminate this email or its contents.

Stephen Davies Consulting                            Voice: 08-8177 1595
Adelaide, South Australia.                             Fax: 08-8177 0133
Computing & Network solutions.                       Mobile:0403 0405 83



More information about the Bogofilter mailing list