Robinson-Fisher use / viewing tokens

Barry Gould BarryGould at PennySaverUSA.net
Tue Jan 21 03:49:56 CET 2003


At 06:34 PM 1/20/2003, David Relson wrote:

>"-v" gives a minimal level of detal (1 line).  "-vv" generates the 
>histogram that you see (11 lines). "-vvv" generates the complete Rtable, 
>which is 75 characters wide.

Well, it doesn't fit on my 80-col terminal window!
If I expand the terminal to 91 cols minimum, it does fit.


>That's a big message - 1500+ distinct tokens, with values all over the map!
>
>'Tis useful to have "min_dev=0.1".  This "takes out" the tokens which are 
>not already known to the wordlists since the spamicity calculation gives 
>them a 0.415 score.  The 0.1 setting which pretty much clears out the 0.40 
>and 0.50 lines.  For your message, it'd cut the count by 550 or so 
>words.  Try the min_dev setting and send the results to the list.  My 
>guesstimate is that the spamicity value won't change a whole lot.


Strangely, it doesn't change at all! It _IS_ using the config file, as the 
format string is altered, but the min_dev adjustment doesn't affect the 
results.

I've been changing config files around (which accounts for the change in 
score perhaps), etc, so I ran it again twice with the old dbs (backups from 
earlier today):

min_dev 0.1:

X-Bogosity: Unsure, tests=bogofilter fisher, spamicity=0.506104, version=0.10.0
           int  cnt    prob   spamicity  histogram
          0.00  110  0.050663  0.017752  #############
          0.10  137  0.149492  0.052285  #################
          0.20  137  0.255764  0.098169  #################
          0.30  194  0.351084  0.167156  #######################
          0.40    0  0.000000  0.167156
          0.50    0  0.000000  0.167156
          0.60  425  0.649922  0.377501 
##################################################
          0.70  223  0.749716  0.445874  ###########################
          0.80  109  0.846461  0.478153  #############
          0.90   53  0.942208  0.502651  #######


min_dev 0.0:

X-Bogosity: Unsure, tests=bogofilter fisher, spamicity=0.506104, version=0.10.0
           int  cnt    prob   spamicity  histogram
          0.00  110  0.050663  0.017752  #############
          0.10  137  0.149492  0.052285  #################
          0.20  137  0.255764  0.098169  #################
          0.30  194  0.351084  0.167156  #######################
          0.40    0  0.000000  0.167156
          0.50    0  0.000000  0.167156
          0.60  425  0.649922  0.377501 
##################################################
          0.70  223  0.749716  0.445874  ###########################
          0.80  109  0.846461  0.478153  #############
          0.90   53  0.942208  0.502651  #######


BTW, I'm getting some VERY strange results on another message...
I'll send info to the list shortly.

Thanks,
Barry





More information about the Bogofilter mailing list