More understanding bogofilter
David Relson
relson at osagesoftware.com
Wed May 7 20:10:31 CEST 2003
At 10:33 AM 5/7/03, Javier Castillo Alcibar wrote:
> Hello all,
>
> if a token appears just one time in goodlist, and just one time in
>spamlist, why has goodlist a bigger "weihgt" to calcute the score??.
>This is an example with my database:
>
>linux-list:~# echo spamfilter | bogofilter -R -vv
>X-Bogosity: No, tests=bogofilter, spamicity=0.028904, version=0.12.2
> n pgood pbad fw
>invfwlog fwlog U
>"spamfilter" 2 0.016949 0.000091 0.028904
>-0.02933 -3.54376 +
>N_P_Q_S_s_x_md 1 9.71e-01 2.89e-02 2.89e-02
>1.00e-01 5.00e-01 0.440
>
>linux-list:~# bogoutil -p .bogofilter spamfilter
> spam good Gra prob Rob prob
>spamfilter 1 1 0.400000 0.028057
Javier,
The Graham probability has a builtin bias towards "good" words, so that
words that haven't been seen often are considered non-spam. If you're not
using Graham, you can ignore that value.
You haven't include the value of .MSG_COUNT, so I can't tell what's going
on. Below is the output from a simple test I just ran. It creates a new
directory and adds messages (of 3 words each) to the spam and good list and
displays their scores. Then it adds another message to the good list and
again displays scores. Looking at the values, I think you'll get a better
idea of what's happening.
David
[relson at osage relson]$ mkdir test.d
[relson at osage relson]$ echo alpha beta gamma | bogofilter -s -d test.d -C
[relson at osage relson]$ echo beta gamma delta | bogofilter -n -d test.d -C
[relson at osage relson]$ bogoutil -p test.d alpha beta gamma delta .MSG_COUNT
spam good Gra prob Rob prob
alpha 1 0 0.400000 0.999416
beta 1 1 0.400000 0.499958
gamma 1 1 0.400000 0.499958
delta 0 1 0.400000 0.000415
.MSG_COUNT 1 1 0.400000 0.499958
[relson at osage relson]$ echo gamma delta | bogofilter -n -d xxx.d -C
[relson at osage relson]$ bogoutil -p xxx.d alpha beta gamma delta .MSG_COUNT
spam good Gra prob Rob prob
alpha 1 0 0.400000 0.999416
beta 1 1 0.400000 0.666499
gamma 1 2 0.400000 0.499958
delta 0 2 0.400000 0.000415
.MSG_COUNT 1 2 0.400000 0.499958
More information about the Bogofilter
mailing list