bogofilter producing poor results
William Ono
a1bformk at tinny.soundwave.net
Tue Nov 12 16:32:37 CET 2002
> On 20021111 (Mon) at 1736:35 -0800, William Ono wrote:
> > $ bogofilter -r -s -v < f-corpus.spam
> > # 626200 words, 1181 messages
> > $ bogofilter -r -n -v < f-corpus.ham
> > # 286022 words, 855 messages
On Tue, Nov 12, 2002 at 07:27:50AM -0500, Greg Louis wrote:
> That's a small training set. Bogofilter, at least in my hands, began
> to perform better (around 5% false negatives and <1% false positives)
> when my training set grew to about 4300 nonspam and 1800 spam (I had no
> spam archive to start with, but I used old nonspams; hence the
> lopsidedness). Now I'm at 6500 and 7200 respectively, and I'm getting
> around 2% false negatives and less than 0.5% false positives these
> around 2% false negatives and less than 0.5% false positives these
> days.
So, if I leave the magic values alone, from the volume of email that I
receive it looks as though I should see better performance after
feeding in a total of about two years' worth of email. Hmm. I think
I'd best go re-read the Robinson paper with a pot of coffee and see what
I remember from my (very few) statistics and probabilities courses, and
get to tuning those magic values.
Thanks kindly for the response. It's good to know that I can expect
better results than I'm seeing so far.
--
William Ono <a1bformk at tinny.soundwave.net>
PGP 2048R/93BA6AFD E3 64 C5 43 3E B3 2D A6 C6 D7 E3 45 90 24 78 DE
More information about the Bogofilter
mailing list