Testing shows katastrophy
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Jan 22 13:34:27 CET 2003
David Relson wrote:
> After your big training run, did you check the message counts in the word
> lists?
No.
> A significant error was uncovered in the mime processing code that
> affects trainning on mailboxes. The error causes an incorrect .MSG_COUNT
> value to be computed and stored in the wordlist. This is likely to cause
> incorrect spamicity scores because the scores use the ratio of a word's
> occurrence to the number of messages. If you still have the bad databases,
> run the command "bogoutil -w /wordlist/dir .MSG_COUNT" to display the
> counts for .MSG_COUNT.
Well, I constantly rebuild with some changes. Not really
successfull so far. Now I am back to defaults.
[3.14 at pi ~]$ bogoutil -w ~/.bogofilter .MSG_COUNT
spam good
.MSG_COUNT 14 5410
Looks really bad. So does this mean I should split with
formail when training?
> The quickest way to "see" why bogofilter classified a message as it did
> (when using Robinson or Robinson-Fisher) is to generate the histogram using
> "-vv" on the command line.
Well, that did not help me too much.
> As a second detail, your use of "min_dev=0.2" will ignore all words with
> spamicities between 0.3 and 0.7. This _may_ be a bit extreme. I use
> "min_dev=0.1" with a high degree of success.
If you look at my mail again, I tried 0.2, 0.1, and 0.0 with
Fisher as well as plain default.
> The graham problem, i.e. "Internal error in graham.c:158]", is caused by
> bogofilter choosing a long mime boundary as one of the 15 extrema
> tokens. That flaw has been in 0.9.1.2 since it was released. I can send
> you a patch for it.
I only can have RPMs installed. But if I solve the above
problem I would try Fisher.
> Plans are to release 0.10.1 in the next day or so. I haven't yet gotten to
> the "-2" and "-3" options
That is not all too important, since there is a workaround.
> nor have I verified/fixed some other bug reports.
That is probably more important.
> If you can update from cvs, that would be a good thing to do.
Not really.
> If your
> problems happen with the newer code, I _really_ want to hear about it. If
> you can't use cvs, I can build a tarball of 0.10.0.cvs and send it to you.
RPM would do:-))
pi
More information about the Bogofilter
mailing list