More understanding bogofilter

David Relson relson at osagesoftware.com
Wed May 7 20:10:31 CEST 2003


At 10:33 AM 5/7/03, Javier Castillo Alcibar wrote:



>  Hello all,
>
>  if a token appears just one time in goodlist, and just one time in
>spamlist, why has goodlist a bigger "weihgt" to calcute the score??.
>This is an example with my database:
>
>linux-list:~# echo spamfilter | bogofilter -R -vv
>X-Bogosity: No, tests=bogofilter, spamicity=0.028904, version=0.12.2
>                                      n    pgood     pbad      fw
>invfwlog    fwlog  U
>"spamfilter"                         2  0.016949  0.000091  0.028904
>-0.02933  -3.54376 +
>N_P_Q_S_s_x_md                       1  9.71e-01  2.89e-02  2.89e-02
>1.00e-01  5.00e-01 0.440
>
>linux-list:~# bogoutil -p .bogofilter spamfilter
>                        spam    good  Gra prob  Rob prob
>spamfilter                1       1  0.400000  0.028057

Javier,

The Graham probability has a builtin bias towards "good" words, so that 
words that haven't been seen often are considered non-spam.  If you're not 
using Graham, you can ignore that value.

You haven't include the value of .MSG_COUNT, so I can't tell what's going 
on.  Below is the output from a simple test I just ran.  It creates a new 
directory and adds messages (of 3 words each) to the spam and good list and 
displays their scores.  Then it adds another message to the good list and 
again displays scores.  Looking at the values, I think you'll get a better 
idea of what's happening.

David

[relson at osage relson]$ mkdir test.d

[relson at osage relson]$ echo alpha beta gamma | bogofilter -s -d test.d -C
[relson at osage relson]$ echo beta gamma delta | bogofilter -n -d test.d -C

[relson at osage relson]$ bogoutil -p test.d alpha beta gamma delta .MSG_COUNT
                        spam    good  Gra prob  Rob prob
alpha                     1       0  0.400000  0.999416
beta                      1       1  0.400000  0.499958
gamma                     1       1  0.400000  0.499958
delta                     0       1  0.400000  0.000415
.MSG_COUNT                1       1  0.400000  0.499958

[relson at osage relson]$ echo gamma delta | bogofilter -n -d xxx.d -C

[relson at osage relson]$ bogoutil -p xxx.d alpha beta gamma delta .MSG_COUNT
                        spam    good  Gra prob  Rob prob
alpha                     1       0  0.400000  0.999416
beta                      1       1  0.400000  0.666499
gamma                     1       2  0.400000  0.499958
delta                     0       2  0.400000  0.000415
.MSG_COUNT                1       2  0.400000  0.499958







More information about the Bogofilter mailing list