A small note on the value of robs
David Relson
relson at osagesoftware.com
Fri Apr 18 16:21:48 CEST 2003
Greetings,
In Greg's write-up, he mentions a message of mine that gave spamicity
scores of 0.999 at s=0.001, and 0.505 at s=1e-8.
When I originally received the message, it was scored at 0.502152 which put
it squarely in the middle of the unsure range. Since I have bogofilter
generate the histograms for Unsure messages (see below), I could see that
it had 9 low scoring tokens and 60 high scoring tokens. With those counts
the message should be scored as spam, no?? Anyhow I experimented a bit and
found that a different robs gave the different score that Greg mentions.
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.502152, version=0.11.1.9
int cnt prob spamicity histogram
0.00 6 0.013462 0.001734 ######
0.10 3 0.130203 0.009890 ###
0.20 0 0.000000 0.009890
0.30 0 0.000000 0.009890
0.40 0 0.000000 0.009890
0.50 0 0.000000 0.009890
0.60 0 0.000000 0.009890
0.70 0 0.000000 0.009890
0.80 6 0.875019 0.190652 ######
0.90 54 0.959375 0.573234 ################################################
To better see the relation of robs to min_dev to spamicity, I scored the
message with a range of robs and min_dev values and my wordlists. The
small chart below shows clearly that as the value of robs decreases, so
does the score.
0.35 0.40 0.45
1e-0 Y 1.000000 Y 1.000000 Y 1.000000
1e-1 Y 1.000000 Y 1.000000 Y 0.964735
1e-2 Y 0.999308 Y 0.999146 U 0.691315
1e-3 Y 0.953902 U 0.936217 U 0.510367
1e-4 U 0.723863 U 0.672107 U 0.500107
1e-5 U 0.537633 U 0.519495 U 0.500000
1e-6 U 0.502152 U 0.500693 U 0.500000
1e-7 U 0.500049 U 0.500010 U 0.500000
1e-8 U 0.500001 U 0.500000 U 0.500000
Hope this helps explain some of our new findings.
David
More information about the Bogofilter
mailing list