Testing shows katastrophy

David Relson relson at osagesoftware.com
Wed Jan 22 16:35:17 CET 2003


At 10:16 AM 1/22/03, Boris 'pi' Piwinger wrote:

>Matthias Andree wrote:
>
> >> [3.14 at pi ~]$ bogoutil -w ~/.bogofilter .MSG_COUNT
> >>                        spam   good
> >> .MSG_COUNT             4186  15000
> >>
> >> Looks much better, the last number is 4 to big, though.
> >> Starting test.
> >
> > 4 too high? Are all of the input files in proper mbox format, i. e. with
> > "From " strings properly escaped?
>
>Yes, I used formail -es to make sure it is that way. mail -f
>and grep -c '^From ' agree on the number 14996.
>
>pi

pi,

If you train with a "-v" in the command line, bogofilter will display 
message and word counts.   That gives a quick consistency check when 
training with a mailbox.  I have a script that does (roughly):

         grep "^From " $1.mbx | wc -l
         bogofilter -v -$2 < $1.mbx

Where $1 and $2 are typically "spam" and "s", or "good" and "n".

It sounds like somewhere in that mailbox of 15,000 messages there are 4 
that bogofilter thinks contain two messages.  If you can isolate them, we 
can take a look at them.





More information about the Bogofilter mailing list