Testing shows katastrophy

David Relson relson at osagesoftware.com
Wed Jan 22 13:25:17 CET 2003


Boris,

"Katastrophy" sounds like the right word.  I guess I should to remove the 
special check for univie.ac.at <grin>.

On a more serious note, 0.10.0 _is_ beta software.  It has lots of new 
features.  Prior to the release, the testing was limited.  I know I've been 
using it successfully for classifying my incoming messages.

The early testing of 0.10 has been good.  People have been using it, 
encountering problems, reporting them, and they're getting fixed, and the 
fixes are going into CVS.

After your big training run, did you check the message counts in the word 
lists? A significant error was uncovered in the mime processing code that 
affects trainning on mailboxes.  The error causes an incorrect .MSG_COUNT 
value to be computed and stored in the wordlist.  This is likely to cause 
incorrect spamicity scores because the scores use the ratio of a word's 
occurrence to the number of messages.  If you still have the bad databases, 
run the command "bogoutil -w /wordlist/dir .MSG_COUNT" to display the 
counts for .MSG_COUNT.

The quickest way to "see" why bogofilter classified a message as it did 
(when using Robinson or Robinson-Fisher) is to generate the histogram using 
"-vv" on the command line.

As a second detail, your use of "min_dev=0.2" will ignore all words with 
spamicities between 0.3 and 0.7.  This _may_ be a bit extreme.  I use 
"min_dev=0.1" with a high degree of success.

The graham problem, i.e. "Internal error in graham.c:158]", is caused by 
bogofilter choosing a long mime boundary as one of the 15 extrema 
tokens.  That flaw has been in 0.9.1.2 since it was released.  I can send 
you a patch for it.

Plans are to release 0.10.1 in the next day or so.  I haven't yet gotten to 
the "-2" and "-3" options nor have I verified/fixed some other bug reports.

If you can update from cvs, that would be a good thing to do.  If your 
problems happen with the newer code, I _really_ want to hear about it.  If 
you can't use cvs, I can build a tarball of 0.10.0.cvs and send it to you.

David





More information about the Bogofilter mailing list