Robinson-Fisher use / viewing tokens
David Relson
relson at osagesoftware.com
Tue Jan 21 04:00:20 CET 2003
At 09:49 PM 1/20/03, Barry Gould wrote:
>At 06:34 PM 1/20/2003, David Relson wrote:
>
>>"-v" gives a minimal level of detal (1 line). "-vv" generates the
>>histogram that you see (11 lines). "-vvv" generates the complete Rtable,
>>which is 75 characters wide.
>
>Well, it doesn't fit on my 80-col terminal window!
>If I expand the terminal to 91 cols minimum, it does fit.
>
>
>>That's a big message - 1500+ distinct tokens, with values all over the map!
>>
>>'Tis useful to have "min_dev=0.1". This "takes out" the tokens which are
>>not already known to the wordlists since the spamicity calculation gives
>>them a 0.415 score. The 0.1 setting which pretty much clears out the
>>0.40 and 0.50 lines. For your message, it'd cut the count by 550 or so
>>words. Try the min_dev setting and send the results to the list. My
>>guesstimate is that the spamicity value won't change a whole lot.
>
>
>Strangely, it doesn't change at all! It _IS_ using the config file, as the
>format string is altered, but the min_dev adjustment doesn't affect the
>results.
I seem to recall some "special" code for setting values from command line
and config file options. An earlier value may be blocking a later
setting. I'll have to look.
>I've been changing config files around (which accounts for the change in
>score perhaps), etc, so I ran it again twice with the old dbs (backups
>from earlier today):
Interesting. Both histograms look like min_dev=0.1. The "hole in the
middle" is my clue.
>BTW, I'm getting some VERY strange results on another message...
>I'll send info to the list shortly.
I've got a couple of other problems to deal with first. I'm afraid you'll
have to take a number :-(
More information about the Bogofilter
mailing list