A small note on the value of robs

David Relson relson at osagesoftware.com
Fri Apr 18 16:21:48 CEST 2003


Greetings,

In Greg's write-up, he mentions a message of mine that gave spamicity 
scores of 0.999 at s=0.001, and 0.505 at s=1e-8.

When I originally received the message, it was scored at 0.502152 which put 
it squarely in the middle of the unsure range.   Since I have bogofilter 
generate the histograms for Unsure messages (see below), I could see that 
it had 9 low scoring tokens and 60 high scoring tokens.  With those counts 
the message should be scored as spam, no??  Anyhow I experimented a bit and 
found that a different robs gave the different score that Greg mentions.

X-Bogosity: Unsure, tests=bogofilter, spamicity=0.502152, version=0.11.1.9
    int  cnt   prob  spamicity histogram
   0.00    6 0.013462 0.001734 ######
   0.10    3 0.130203 0.009890 ###
   0.20    0 0.000000 0.009890
   0.30    0 0.000000 0.009890
   0.40    0 0.000000 0.009890
   0.50    0 0.000000 0.009890
   0.60    0 0.000000 0.009890
   0.70    0 0.000000 0.009890
   0.80    6 0.875019 0.190652 ######
   0.90   54 0.959375 0.573234 ################################################


To better see the relation of robs to min_dev to spamicity, I scored the 
message with a range of robs and min_dev values and my wordlists.  The 
small chart below shows clearly that as the value of robs decreases, so 
does the score.

           0.35        0.40        0.45
1e-0  Y 1.000000  Y 1.000000  Y 1.000000
1e-1  Y 1.000000  Y 1.000000  Y 0.964735
1e-2  Y 0.999308  Y 0.999146  U 0.691315
1e-3  Y 0.953902  U 0.936217  U 0.510367
1e-4  U 0.723863  U 0.672107  U 0.500107
1e-5  U 0.537633  U 0.519495  U 0.500000
1e-6  U 0.502152  U 0.500693  U 0.500000
1e-7  U 0.500049  U 0.500010  U 0.500000
1e-8  U 0.500001  U 0.500000  U 0.500000

Hope this helps explain some of our new findings.

David





More information about the Bogofilter mailing list