man bogoutil

Greg Louis glouis at dynamicro.on.ca
Fri Nov 22 13:32:54 CET 2002


On 20021122 (Fri) at 0938:17 +0100, Boris 'pi' Piwinger wrote:
> Greg Louis <glouis at dynamicro.on.ca> wrote:
> 
> >> Actully, I just today removed the -r which gave pretty poor
> >> result for me (missing a lot). I don't have statistics on
> >> that, though.
> >
> >Surprising.  If you have a reasonably large training set and could
> >_get_ stats, it would be good to see them, 'coz Robinson's approach is
> >working quite a bit better than the original for most people who've
> >compared them.  
> 
> OK, sorry, I don't have time to work out statistics. I am
> willing to share my training sets (continually growing),
> though.

I'd very much like to take you up on that offer.  Can you make them
available on an ftp site somewhere?  I'd need the .db files but also a
lot (at least 1200) of emails that aren't in the training set but come
from the same population as those that are, to use as a test corpus.

> Since I changed, the number of spam missed dropped
> significantly. So my feeling proved right for me.

I suspect that tuning the Robinson version's parameters would have done
at least as well, but that's why I'd like to try the comparison.  If
there is a type of email for which Graham's original calculation really
works better, we should try to learn what's different about it.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the Bogofilter mailing list