More understanding bogofilter

Javier Castillo Alcibar javier.castillo at euroview-spain.com
Thu May 8 08:42:03 CEST 2003


Daniel,

I have my goodlist very small compared to spamlist:

linux-list:~# bogoutil -w .bogofilter .MSG_COUNT
                       spam   good
.MSG_COUNT            10970     59

As Greg'd suggest me, I guess this is the problem... 

Javier.


-----Mensaje original-----
De: David Relson [mailto:relson at osagesoftware.com] 
Enviado el: miércoles, 07 de mayo de 2003 20:11
Para: bogofilter at aotto.com
Asunto: Re: More understanding bogofilter


At 10:33 AM 5/7/03, Javier Castillo Alcibar wrote:



>  Hello all,
>
>  if a token appears just one time in goodlist, and just one time in 
>spamlist, why has goodlist a bigger "weihgt" to calcute the score??. 
>This is an example with my database:
>
>linux-list:~# echo spamfilter | bogofilter -R -vv
>X-Bogosity: No, tests=bogofilter, spamicity=0.028904, version=0.12.2
>                                      n    pgood     pbad      fw
>invfwlog    fwlog  U
>"spamfilter"                         2  0.016949  0.000091  0.028904
>-0.02933  -3.54376 +
>N_P_Q_S_s_x_md                       1  9.71e-01  2.89e-02  2.89e-02
>1.00e-01  5.00e-01 0.440
>
>linux-list:~# bogoutil -p .bogofilter spamfilter
>                        spam    good  Gra prob  Rob prob
>spamfilter                1       1  0.400000  0.028057

Javier,

The Graham probability has a builtin bias towards "good" words, so that 
words that haven't been seen often are considered non-spam.  If you're not 
using Graham, you can ignore that value.

You haven't include the value of .MSG_COUNT, so I can't tell what's going 
on.  Below is the output from a simple test I just ran.  It creates a new 
directory and adds messages (of 3 words each) to the spam and good list and 
displays their scores.  Then it adds another message to the good list and 
again displays scores.  Looking at the values, I think you'll get a better 
idea of what's happening.

David

[relson at osage relson]$ mkdir test.d

[relson at osage relson]$ echo alpha beta gamma | bogofilter -s -d test.d -C [relson at osage relson]$ echo beta gamma delta | bogofilter -n -d test.d -C

[relson at osage relson]$ bogoutil -p test.d alpha beta gamma delta .MSG_COUNT
                        spam    good  Gra prob  Rob prob
alpha                     1       0  0.400000  0.999416
beta                      1       1  0.400000  0.499958
gamma                     1       1  0.400000  0.499958
delta                     0       1  0.400000  0.000415
.MSG_COUNT                1       1  0.400000  0.499958

[relson at osage relson]$ echo gamma delta | bogofilter -n -d xxx.d -C

[relson at osage relson]$ bogoutil -p xxx.d alpha beta gamma delta .MSG_COUNT
                        spam    good  Gra prob  Rob prob
alpha                     1       0  0.400000  0.999416
beta                      1       1  0.400000  0.666499
gamma                     1       2  0.400000  0.499958
delta                     0       2  0.400000  0.000415
.MSG_COUNT                1       2  0.400000  0.499958









More information about the Bogofilter mailing list