Question

Stephen Davies scldad at sdc.com.au
Wed May 20 07:05:55 CEST 2009


Here is an example.
The first two lines seem to have an out of proportion influence on the end 
result.

After I feed this email through bogofilter -Sn, subsequent runs give a 
spamicity of 1.

Cheers,
Steephen

-bash-3.2# bogofilter -vvv < spam3
X-Bogosity: Ham, tests=bogofilter, spamicity=0.776434, version=1.2.0
                                        n    pgood     pbad      fw     U
  "Cain"                                2  0.000146  0.000000  0.004587 +
  "www.lufriek.com"                     1  0.000073  0.000000  0.009094 +
  "May"                             16446  0.209690  0.042807  0.169536 -
  "amp"                              9852  0.106958  0.026452  0.198274 -
  "scldad"                          41942  0.439417  0.113300  0.204987 -
  "Issue"                            5361  0.052022  0.014661  0.219867 -
  "Capell"                             39  0.000291  0.000110  0.274895 -
  "here.Copyright"                     39  0.000291  0.000110  0.274895 -
  "here.To"                            39  0.000291  0.000110  0.274895 -
  "newsletter.Issue"                   39  0.000291  0.000110  0.274895 -
  "sdc.com.au"                      70468  0.494572  0.200911  0.288880 -
  "Bill"                             1077  0.006922  0.003098  0.309210 -
  "remove"                           6423  0.040656  0.018504  0.312783 -
  "view"                            12837  0.076138  0.037204  0.328244 -
  "subj:incredible"                    94  0.000510  0.000274  0.349917 -
  "subj:diet"                         904  0.004809  0.002644  0.354763 -
  "rcvd:May"                        72404  0.369472  0.212436  0.365069 -
  "Info"                             2046  0.009763  0.006032  0.381904 -
  "mime:iso-8859-1"                 46835  0.220765  0.138205  0.385005 -
  "subj:loss"                        1388  0.006412  0.004102  0.390133 -
  "head:V6.00.2900.2180"            57798  0.248962  0.171573  0.407987 -
  "message"                        165183  0.693770  0.491112  0.414482 -
  "This"                           191332  0.796284  0.569174  0.416837 -
  "Berry"                            3636  0.014718  0.010834  0.424011 -
  "subj:weight"                      5821  0.023242  0.017359  0.427547 -
  "rcvd:from"                      181290  0.707322  0.541343  0.433537 -
  "rcvd:Wed"                        29851  0.112641  0.089303  0.442215 -
  "subj:will"                        9572  0.035337  0.028670  0.447917 -
  "subj:lead"                          62  0.000219  0.000186  0.459949 -
  "head:Content-Type"              206432  0.645537  0.623342  0.491254 -
  "Try"                              5433  0.016976  0.016406  0.491459 -
  "head:Date"                      219291  0.673880  0.662685  0.495812 -
  "Dr's"                              167  0.000510  0.000505  0.497432 -
  "mime:quoted-printable"          114172  0.333770  0.345761  0.508823 -
  "mime:plain"                     130170  0.380474  0.394212  0.508867 -
  "mime:Content-Type"              133617  0.390310  0.404662  0.509027 -
  "mime:charset"                   130277  0.380255  0.394559  0.509231 -
  "mime:Content-Transfer-Encoding"  131330  0.378652  0.397951  0.512425 -
  "mime:text"                      134505  0.382222  0.407814  0.516196 -
  "head:multipart"                 136306  0.387322  0.413275  0.516208 -
  "privacy"                         12380  0.035118  0.037538  0.516653 -
  "mime:html"                      130326  0.365392  0.395358  0.519695 -
  "Berry.More"                          0  0.000000  0.000000  0.520000 -
  "Oprah's"                             0  0.000000  0.000000  0.520000 -
  "from:friscoplano"                    0  0.000000  0.000000  0.520000 -
  "from:marshallmn.com"                 0  0.000000  0.000000  0.520000 -
  "goxjvuqvofxj"                        0  0.000000  0.000000  0.520000 -
  "head:From"                           0  --------  --------  0.520000 i
  "head:May"                            0  --------  --------  0.520000 i
  "head:OZb"                            0  0.000000  0.000000  0.520000 -
  "head:Status"                         0  --------  --------  0.520000 i
  "head:Wed"                            0  --------  --------  0.520000 i
  "head:X-KMail-EncryptionState"        0  --------  --------  0.520000 i
  "head:X-KMail-MDN-Sent"               0  --------  --------  0.520000 i
  "head:X-KMail-SignatureState"         0  --------  --------  0.520000 i
  "head:X-Status"                       0  --------  --------  0.520000 i
  "head:X-Virus-Scanned"                0  --------  --------  0.520000 i
  "head:amavisd-new"                    0  --------  --------  0.520000 i
  "head:friscoplano"                    0  0.000000  0.000000  0.520000 -
  "head:marshallmn.com"                 0  0.000000  0.000000  0.520000 -
  "head:sdc.com.au"                     0  --------  --------  0.520000 i
  "rcvd:ESMTP"                          0  --------  --------  0.520000 i
  "rcvd:mustang.sdc.com.au"             0  --------  --------  0.520000 i
  "rtrn:friscoplano"                    0  0.000000  0.000000  0.520000 -
  "rtrn:marshallmn.com"                 0  0.000000  0.000000  0.520000 -
  "url:78.171.200"                      0  0.000000  0.000000  0.520000 -
  "url:78.171.200.136"                  0  0.000000  0.000000  0.520000 -
  "weightlos"                           0  0.000000  0.000000  0.520000 -
  "secret"                           4412  0.011803  0.013409  0.531840 -
  "unsubscribe"                     23370  0.061494  0.071070  0.536119 -
  "format"                         153476  0.388415  0.467399  0.546145 -
  "unsubscribe.php"                   289  0.000729  0.000880  0.547129 -
  "head:alternative"               112408  0.276648  0.342669  0.553301 -
  "head:X-MimeOLE"                 148448  0.362987  0.452637  0.554958 -
  "head:Produced"                  149024  0.363716  0.454423  0.555435 -
  "head:Microsoft"                 150258  0.364517  0.458281  0.556979 -
  "head:MIME-Version"              197494  0.478106  0.602393  0.557514 -
  "head:Express"                   119544  0.289253  0.364637  0.557643 -
  "head:MimeOLE"                   148457  0.357158  0.452918  0.559105 -
  "head:Message-ID"                193077  0.457413  0.589353  0.563022 -
  "here"                            59764  0.140765  0.182461  0.564499 -
  "head:X-Mailer"                  178652  0.416466  0.545615  0.567119 -
  "head:X-MSMail-Priority"         127712  0.297195  0.390064  0.567565 -
  "head:Normal"                    158595  0.367140  0.484471  0.568888 -
  "click"                           34323  0.078033  0.104910  0.573459 -
  "head:X-Priority"                162488  0.367067  0.496757  0.575067 -
  "MIME"                           147671  0.331949  0.451530  0.576314 -
  "url:78"                           4726  0.010492  0.014456  0.579454 -
  "url:78.171"                         99  0.000219  0.000303  0.580822 -
  "head:Outlook"                   139700  0.302732  0.427646  0.585513 -
  "multi-part"                     144920  0.302878  0.444109  0.594534 -
  "policy"                           6454  0.013260  0.019788  0.598759 -
  "mail"                            15626  0.031840  0.047921  0.600812 -
  "Arial"                          108047  0.207577  0.331901  0.615226 -
  "face"                           132373  0.223024  0.407981  0.646558 -
  "More"                            17609  0.029508  0.054279  0.647819 -
  "our"                             82451  0.135956  0.254247  0.651575 -
  "Doctor"                           4349  0.006922  0.013421  0.659753 -
  "href"                           184887  0.286849  0.570899  0.665579 -
  "size"                           145426  0.214062  0.449551  0.677430 -
  "http"                           272478  0.388124  0.842865  0.684706 -
  "to:sdc.com.au"                  331278  0.466521  1.024985  0.687215 -
  "color"                           96446  0.129253  0.298691  0.697967 -
  "Pharmacy"                         4574  0.004736  0.014226  0.750240 -
  "Copyright"                        5097  0.005246  0.015854  0.751376 -
  "newsletter"                      10094  0.009326  0.031443  0.771246 -
  "Acai"                            17439  0.005537  0.054780  0.908197 +
  "subj:Acai"                       57043  0.006557  0.179687  0.964791 +
  "to:scldad"                      160570  0.012095  0.506077  0.976659 +
  "rcvd:with"                      129609  0.000291  0.408905  0.999288 +
  "from:Cain"                          34  0.000000  0.000107  0.999749 +
  "head:j!!"                          161  0.000000  0.000508  0.999947 +
  "rcvd:forged"                     20182  0.000000  0.063674  1.000000 +
  "rcvd:may"                        20183  0.000000  0.063678  1.000000 +
  "head:X-UIDL"                     45032  0.000000  0.142077  1.000000 +
  N_P_Q_S_s_x_md                       11  0.000000  0.552868  0.776434
                                           0.017800  0.520000  0.375000
On Wednesday 20 May 2009 11:57:22 RW wrote:
> On Wed, 20 May 2009 10:10:37 +0930
>
> Stephen Davies <scldad at sdc.com.au> wrote:
> > > Is it possible to configure bogofilter so that it ignore tokens
> > > with an occurrence count less than N?
> > >
> > > I am seeing many obvious spams where tokens with an occurrence
> > > count of one are outweighing other tokens with big occurrence
> > > counts, big bad counts and zero good counts.
>
> Do you have an example of this? The example you gave before involved
> previous miss-training where you had learned some obviously spammy
> tokens as ham - including a Mexican IP address that's in Spamhaus XBL.
>
> > Sorry. I just realised that I have misrepresented my question.
> >
> > What I should have asked is whether it is possible to ignore (for the
> > calculation of total spamicity) tokens that have never been seen
> > before. That is, with occurrence count of zero.
>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter



-- 
=============================================================================
Stephen Davies Consulting P/L                             Voice: 08-8177 1595
Adelaide, South Australia.                                Fax  : 08-8177 0133
Computing & Network solutions.                            Mobile:040 304 0583
                                          VoIP:sip:1132210 at sip1.bbpglobal.com



More information about the Bogofilter mailing list