spamicity values and switches

Joerg Over over at dexia.de
Wed Apr 30 17:18:09 CEST 2003


Hi there!


I heard about the bayesian approach to filtering spam and startet
trying bogofilter about a week ago. atm, my databases are very
small, that might be one of the reasons for the watched
behaviour, but it still looks strange...

I'm using version 0.11.2 (and just reproduced with 0.12.2).
Config is default, nothing tuned for the beginning.
I have 43 messages, 2476 tokens in my spamlist and 76 messages,
4144 tokens in my goodlist.
I have a _very_ huge amount, at times around 90%, of 0.500000
spamicity values using Robinson-Fisher (that mails generally get
0.000000 using graham). The only value changing after feeding the
classical hardcore spam demonstrated here is the robinson one.

Before feeding into spamdatabase:
============= -r ==============================================
X-Bogosity: No, tests=bogofilter, spamicity=0.457528,
version=0.11.2
============= -g ==============================================
X-Bogosity: No, tests=bogofilter, spamicity=0.000000,
version=0.11.2
============= -f ==============================================
X-Bogosity: No, tests=bogofilter, spamicity=0.500000,
version=0.11.2

After:
============= -r ==============================================
X-Bogosity: No, tests=bogofilter, spamicity=0.530852,
version=0.11.2
============= -g ==============================================
X-Bogosity: No, tests=bogofilter, spamicity=0.000000,
version=0.11.2
============= -f ==============================================
X-Bogosity: No, tests=bogofilter, spamicity=0.500000,
version=0.11.2

At first I thought this might have to do with being unable to
lock the db but the values get calculated... like you can see
here, example is using Robinson-Fisher after feeding the mail
into the spam db:

X-Bogosity: No, tests=bogofilter, spamicity=0.500000,
version=0.11.2
                                     n    pgood     pbad      fw
   U
"header"                             8  0.105263  0.000000
0.000052 +
"gid"                                4  0.052632  0.000000
0.000104 +
"microsoft"                          3  0.039474  0.000000
0.000138 +
"references"                         3  0.039474  0.000000
0.000138 +
"hostname"                           2  0.026316  0.000000
0.000207 +
"html"                               2  0.026316  0.000000
0.000207 +
"primary"                            2  0.026316  0.000000
0.000207 +
"127.0.0.1"                          1  0.013158  0.000000
0.000415 +
"abuse"                              1  0.013158  0.000000
0.000415 +
"localhost"                          1  0.013158  0.000000
0.000415 +
"originator"                         1  0.013158  0.000000
0.000415 +
"recipient"                          1  0.013158  0.000000
0.000415 +
"report"                             1  0.013158  0.000000
0.000415 +
"uid"                                1  0.013158  0.000000
0.000415 +
"added"                              7  0.078947  0.023256
0.227572 +
"see"                               21  0.197368  0.139535
0.414169 -
"15.4.228.67"                        0  0.000000  0.000000
0.415000 -
"195.170.96.30"                      0  0.000000  0.000000
0.415000 -
"211.49.133.26"                      0  0.000000  0.000000
0.415000 -
"54.191.49.88"                       0  0.000000  0.000000
0.415000 -
"64.106.200.170"                     0  0.000000  0.000000
0.415000 -
"67.0.8.239"                         0  0.000000  0.000000
0.415000 -
"82.91.34.87"                        0  0.000000  0.000000
0.415000 -
"caller"                             0  0.000000  0.000000
0.415000 -
"duncanthrax.net"                    0  0.000000  0.000000
0.415000 -
"en-us"                              0  0.000000  0.000000
0.415000 -
"exim4"                              0  0.000000  0.000000
0.415000 -
"exiscan"                            0  0.000000  0.000000
0.415000 -
"fetchmail-6.2.2"                    0  0.000000  0.000000
0.415000 -
"fz6arol7d"                          0  0.000000  0.000000
0.415000 -
"gecko"                              0  0.000000  0.000000
0.415000 -
"htr1um"                             0  0.000000  0.000000
0.415000 -
"httwxwcy"                           0  0.000000  0.000000
0.415000 -
"jnetamt"                            0  0.000000  0.000000
0.415000 -
"localhost.dexia.de"                 0  0.000000  0.000000
0.415000 -
"mail1.dexia.de"                     0  0.000000  0.000000
0.415000 -
"mimeole"                            0  0.000000  0.000000
0.415000 -
"mozilla"                            0  0.000000  0.000000
0.415000 -
"normal"                             0  0.000000  0.000000
0.415000 -
"precedence"                         0  0.000000  0.000000
0.415000 -
"produced"                           0  0.000000  0.000000
0.415000 -
"qmail"                              0  0.000000  0.000000
0.415000 -
"reply-to"                           0  0.000000  0.000000
0.415000 -
"single-drop"                        0  0.000000  0.000000
0.415000 -
"tequila01.sireco.de"                0  0.000000  0.000000
0.415000 -
"todd"                               0  0.000000  0.000000
0.415000 -
"track"                              0  0.000000  0.000000
0.415000 -
"ua9ltxy"                            0  0.000000  0.000000
0.415000 -
"user-agent"                         0  0.000000  0.000000
0.415000 -
"v6.00.2800.1106"                    0  0.000000  0.000000
0.415000 -
"vaamy"                              0  0.000000  0.000000
0.415000 -
"ver"                                0  0.000000  0.000000
0.415000 -
"x-antiabuse"                        0  0.000000  0.000000
0.415000 -
"x-mimeole"                          0  0.000000  0.000000
0.415000 -
"x-msmail-priority"                  0  0.000000  0.000000
0.415000 -
"x-originating-host"                 0  0.000000  0.000000
0.415000 -
"x-originating-ip"                   0  0.000000  0.000000
0.415000 -
"x-owner"                            0  0.000000  0.000000
0.415000 -
"x-priority"                         0  0.000000  0.000000
0.415000 -
"x-received-ip"                      0  0.000000  0.000000
0.415000 -
"x-scanner"                          0  0.000000  0.000000
0.415000 -
"x-uidl"                             0  0.000000  0.000000
0.415000 -
"xs-coze"                            0  0.000000  0.000000
0.415000 -
"http"                              56  0.513158  0.395349
0.435163 -
"userid"                           104  0.921053  0.790698
0.461923 -
"the"                              101  0.881579  0.790698
0.472827 -
"windows"                          115  0.973684  0.953488
0.494760 -
"apr"                              119  1.000000  1.000000
0.499999 -
"cest"                             119  1.000000  1.000000
0.499999 -
"delivered-to"                     119  1.000000  1.000000
0.499999 -
"dexia.de"                         119  1.000000  1.000000
0.499999 -
"from"                             119  1.000000  1.000000
0.499999 -
"postfix"                          119  1.000000  1.000000
0.499999 -
"received"                         119  1.000000  1.000000
0.499999 -
"return-path"                      119  1.000000  1.000000
0.499999 -
"x-original-to"                    119  1.000000  1.000000
0.499999 -
"charset"                          118  0.986842  1.000000
0.503311 -
"content-type"                     118  0.986842  1.000000
0.503311 -
"for"                              118  0.986842  1.000000
0.503311 -
"subject"                          118  0.986842  1.000000
0.503311 -
"text"                             118  0.986842  1.000000
0.503311 -
"with"                             118  0.986842  1.000000
0.503311 -
"x-mailer"                         118  0.986842  1.000000
0.503311 -
"include"                            5  0.039474  0.046512
0.540900 -
"this"                              80  0.631579  0.744186
0.540924 -
"not"                               57  0.447368  0.534884
0.544546 -
"any"                               42  0.328947  0.395349
0.545835 -
"you"                               82  0.605263  0.837209
0.580397 -
"was"                               42  0.302632  0.441860
0.593502 -
"over"                              11  0.078947  0.116279
0.595595 -
"are"                               55  0.394737  0.581395
0.595608 -
"monster"                            2  0.013158  0.023256
0.638544 +
"net"                                2  0.013158  0.023256
0.638544 +
"pop3"                               2  0.013158  0.023256
0.638544 +
"big"                                6  0.039474  0.069767
0.638618 +
"network"                           10  0.065789  0.116279
0.638633 +
"ollikahn.dexia.de"                 12  0.078947  0.139535
0.638637 +
"into"                              23  0.131579  0.302326
0.696744 +
"please"                            31  0.171053  0.418605
0.709902 +
"smtp"                              12  0.065789  0.162791
0.712157 +
"them"                              17  0.092105  0.232558
0.716288 +
"iso-8859-1"                         3  0.013158  0.046512
0.779366 +
"invoked"                            6  0.026316  0.093023
0.779426 +
"while"                             16  0.052632  0.279070
0.841302 +
"here"                              24  0.078947  0.418605
0.841311 +
"click"                              9  0.026316  0.162791
0.860792 +
"interested"                         9  0.026316  0.162791
0.860792 +
"unknown"                            6  0.013158  0.116279
0.898265 +
"wed"                                6  0.013158  0.116279
0.898265 +
"a2vaeoiwqnbp"                       1  0.000000  0.023256
0.999416 +
"alberto"                            1  0.000000  0.023256
0.999416 +
"babes"                              1  0.000000  0.023256
0.999416 +
"cock"                               1  0.000000  0.023256
0.999416 +
"cocks"                              1  0.000000  0.023256
0.999416 +
"gpuhakaz33baadhwus8"                1  0.000000  0.023256
0.999416 +
"hottest"                            1  0.000000  0.023256
0.999416 +
"julie2_uzit"                        1  0.000000  0.023256
0.999416 +
"massive"                            1  0.000000  0.023256
0.999416 +
"screaming"                          1  0.000000  0.023256
0.999416 +
"squeeze"                            1  0.000000  0.023256
0.999416 +
"tearing"                            1  0.000000  0.023256
0.999416 +
"terra.es"                           1  0.000000  0.023256
0.999416 +
"tightest"                           1  0.000000  0.023256
0.999416 +
"anymore"                            2  0.000000  0.046512
0.999708 +
"girls"                              2  0.000000  0.046512
0.999708 +
"holes"                              2  0.000000  0.046512
0.999708 +
"hot"                                2  0.000000  0.046512
0.999708 +
"nig"                                2  0.000000  0.046512
0.999708 +
"pleasure"                           2  0.000000  0.046512
0.999708 +
"bulk"                               4  0.000000  0.093023
0.999854 +
"helo"                               4  0.000000  0.093023
0.999854 +
"site"                               4  0.000000  0.093023
0.999854 +
N_P_Q_S_s_x_md                      56  0.00e+00  3.22e-12
5.00e-01
                                        1.00e-03  4.15e-01  0.100

I believe that especially with a small database, feeding a spam
into the spambase should trigger a significant value change...
which does happen, but only with the robinson method. The other
two seem totally unimpressed.
I'd also think that getting exactly 0.5 resp. 0.0 as spamicity
should be a rather rare occurence.

Can you tell me why that is? Should I provide more info on
whatever? (I'll do. Just tell me how). Is this normal under the
circumstances, and if, why?

Btw the bogofilter-faq.html
(http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/bogofil
ter/bogofilter/doc/bogofilter-faq.html) also consistently shows a
0.500000 spamicity... coincidence?

If this is old news, sorry, I didn't find anything in the
archives. In that case, just point me there.

Greetings, jo
-- 
+-----------------------------------------------------------------
--+
|  __ __ __ __ _ _          just another pointless signature
  |
| / _ \ V / -_) '_/
  |
| \___/\_/\___|_|
  |
+-----------------------------------------------------------------
--+





More information about the Bogofilter mailing list