Testing shows katastrophy

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Jan 22 13:34:27 CET 2003


David Relson wrote:

> After your big training run, did you check the message counts in the word 
> lists?

No.

> A significant error was uncovered in the mime processing code that 
> affects trainning on mailboxes.  The error causes an incorrect .MSG_COUNT 
> value to be computed and stored in the wordlist.  This is likely to cause 
> incorrect spamicity scores because the scores use the ratio of a word's 
> occurrence to the number of messages.  If you still have the bad databases, 
> run the command "bogoutil -w /wordlist/dir .MSG_COUNT" to display the 
> counts for .MSG_COUNT.

Well, I constantly rebuild with some changes. Not really
successfull so far. Now I am back to defaults.

[3.14 at pi ~]$ bogoutil -w ~/.bogofilter .MSG_COUNT
                       spam   good
.MSG_COUNT               14   5410

Looks really bad. So does this mean I should split with
formail when training?

> The quickest way to "see" why bogofilter classified a message as it did 
> (when using Robinson or Robinson-Fisher) is to generate the histogram using 
> "-vv" on the command line.

Well, that did not help me too much.

> As a second detail, your use of "min_dev=0.2" will ignore all words with 
> spamicities between 0.3 and 0.7.  This _may_ be a bit extreme.  I use 
> "min_dev=0.1" with a high degree of success.

If you look at my mail again, I tried 0.2, 0.1, and 0.0 with
Fisher as well as plain default.

> The graham problem, i.e. "Internal error in graham.c:158]", is caused by 
> bogofilter choosing a long mime boundary as one of the 15 extrema 
> tokens.  That flaw has been in 0.9.1.2 since it was released.  I can send 
> you a patch for it.

I only can have RPMs installed. But if I solve the above
problem I would try Fisher.

> Plans are to release 0.10.1 in the next day or so.  I haven't yet gotten to 
> the "-2" and "-3" options

That is not all too important, since there is a workaround.

> nor have I verified/fixed some other bug reports.

That is probably more important.

> If you can update from cvs, that would be a good thing to do.

Not really.

> If your 
> problems happen with the newer code, I _really_ want to hear about it.  If 
> you can't use cvs, I can build a tarball of 0.10.0.cvs and send it to you.

RPM would do:-))

pi





More information about the Bogofilter mailing list