bogofilter producing poor results

William Ono a1bformk at tinny.soundwave.net
Tue Nov 12 16:32:37 CET 2002


> On 20021111 (Mon) at 1736:35 -0800, William Ono wrote:                       
> > $ bogofilter -r -s -v < f-corpus.spam                                      
> > # 626200 words, 1181 messages                                              
> > $ bogofilter -r -n -v < f-corpus.ham                                       
> > # 286022 words, 855 messages                                               
                                                                               
On Tue, Nov 12, 2002 at 07:27:50AM -0500, Greg Louis wrote:                    
> That's a small training set.  Bogofilter, at least in my hands, began        
> to perform better (around 5% false negatives and <1% false positives)        
> when my training set grew to about 4300 nonspam and 1800 spam (I had no      
> spam archive to start with, but I used old nonspams; hence the               
> lopsidedness).  Now I'm at 6500 and 7200 respectively, and I'm getting       
> around 2% false negatives and less than 0.5% false positives these           
> around 2% false negatives and less than 0.5% false positives these           
> days.                                                                        
                                                                               
So, if I leave the magic values alone, from the volume of email that I         
receive it looks as though I should see better performance after               
feeding in a total of about two years' worth of email.  Hmm.  I think          
I'd best go re-read the Robinson paper with a pot of coffee and see what       
I remember from my (very few) statistics and probabilities courses, and        
get to tuning those magic values.                                              
                                                                               
Thanks kindly for the response.  It's good to know that I can expect           
better results than I'm seeing so far.                                         
                                                                               
--                                                                             
William Ono <a1bformk at tinny.soundwave.net>                                     
PGP 2048R/93BA6AFD E3 64 C5 43 3E B3 2D A6    C6 D7 E3 45 90 24 78 DE




More information about the Bogofilter mailing list