Robinson-Fisher use / viewing tokens

David Relson relson at osagesoftware.com
Tue Jan 21 04:00:20 CET 2003


At 09:49 PM 1/20/03, Barry Gould wrote:

>At 06:34 PM 1/20/2003, David Relson wrote:
>
>>"-v" gives a minimal level of detal (1 line).  "-vv" generates the 
>>histogram that you see (11 lines). "-vvv" generates the complete Rtable, 
>>which is 75 characters wide.
>
>Well, it doesn't fit on my 80-col terminal window!
>If I expand the terminal to 91 cols minimum, it does fit.
>
>
>>That's a big message - 1500+ distinct tokens, with values all over the map!
>>
>>'Tis useful to have "min_dev=0.1".  This "takes out" the tokens which are 
>>not already known to the wordlists since the spamicity calculation gives 
>>them a 0.415 score.  The 0.1 setting which pretty much clears out the 
>>0.40 and 0.50 lines.  For your message, it'd cut the count by 550 or so 
>>words.  Try the min_dev setting and send the results to the list.  My 
>>guesstimate is that the spamicity value won't change a whole lot.
>
>
>Strangely, it doesn't change at all! It _IS_ using the config file, as the 
>format string is altered, but the min_dev adjustment doesn't affect the 
>results.

I seem to recall some "special" code for setting values from command line 
and config file options.  An earlier value may be blocking a later 
setting.  I'll have to look.

>I've been changing config files around (which accounts for the change in 
>score perhaps), etc, so I ran it again twice with the old dbs (backups 
>from earlier today):

Interesting.  Both histograms look like min_dev=0.1.  The "hole in the 
middle" is my clue.

>BTW, I'm getting some VERY strange results on another message...
>I'll send info to the list shortly.

I've got a couple of other problems to deal with first.  I'm afraid you'll 
have to take a number :-(
   





More information about the Bogofilter mailing list