Testing shows katastrophy
David Relson
relson at osagesoftware.com
Wed Jan 22 16:35:17 CET 2003
At 10:16 AM 1/22/03, Boris 'pi' Piwinger wrote:
>Matthias Andree wrote:
>
> >> [3.14 at pi ~]$ bogoutil -w ~/.bogofilter .MSG_COUNT
> >> spam good
> >> .MSG_COUNT 4186 15000
> >>
> >> Looks much better, the last number is 4 to big, though.
> >> Starting test.
> >
> > 4 too high? Are all of the input files in proper mbox format, i. e. with
> > "From " strings properly escaped?
>
>Yes, I used formail -es to make sure it is that way. mail -f
>and grep -c '^From ' agree on the number 14996.
>
>pi
pi,
If you train with a "-v" in the command line, bogofilter will display
message and word counts. That gives a quick consistency check when
training with a mailbox. I have a script that does (roughly):
grep "^From " $1.mbx | wc -l
bogofilter -v -$2 < $1.mbx
Where $1 and $2 are typically "spam" and "s", or "good" and "n".
It sounds like somewhere in that mailbox of 15,000 messages there are 4
that bogofilter thinks contain two messages. If you can isolate them, we
can take a look at them.
More information about the Bogofilter
mailing list