man bogoutil
Greg Louis
glouis at dynamicro.on.ca
Fri Nov 22 13:32:54 CET 2002
On 20021122 (Fri) at 0938:17 +0100, Boris 'pi' Piwinger wrote:
> Greg Louis <glouis at dynamicro.on.ca> wrote:
>
> >> Actully, I just today removed the -r which gave pretty poor
> >> result for me (missing a lot). I don't have statistics on
> >> that, though.
> >
> >Surprising. If you have a reasonably large training set and could
> >_get_ stats, it would be good to see them, 'coz Robinson's approach is
> >working quite a bit better than the original for most people who've
> >compared them.
>
> OK, sorry, I don't have time to work out statistics. I am
> willing to share my training sets (continually growing),
> though.
I'd very much like to take you up on that offer. Can you make them
available on an ftp site somewhere? I'd need the .db files but also a
lot (at least 1200) of emails that aren't in the training set but come
from the same population as those that are, to use as a test corpus.
> Since I changed, the number of spam missed dropped
> significantly. So my feeling proved right for me.
I suspect that tuning the Robinson version's parameters would have done
at least as well, but that's why I'd like to try the comparison. If
there is a type of email for which Graham's original calculation really
works better, we should try to learn what's different about it.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
More information about the Bogofilter
mailing list