garbage removal and 'outsiders noise'
Greg Louis
glouis at dynamicro.on.ca
Wed Apr 16 22:51:48 CEST 2003
On 20030416 (Wed) at 1337:53 -0400, Jim Correia wrote:
> On Wednesday, April 16, 2003, at 01:22 PM, David Relson wrote:
>
> >His most recent test, "Bogofilter parameters(continued)", shows that
> >using different parameters can have a major effect in making
> >bogofilter more accurate.
> >
> >Conclusion, using a site's email to determine the best parameters for
> >bogofilter can have a _big_ effect.
> >
> Is it naive of me to be running bogofilter with the defaults?
The defaults were chosen some time ago; they don't work badly, but
chances are one can do a lot better.
> I'm running with spam/ham classification, cutoff of 0.95.
>
> (I notice that some of the false negatives are close to the cutoff, but
> most are numerically far from it, so perhaps I am answering my own
> question :-)
>
> This catches about 90% of my spam (I retrain -Ns with the false
> negatives) and haven't had a false positive yet.
Sounds as though you could lower the spam cutoff a bit without changing
anything else; that would increase the success rate catching spam (but
also the chance of getting some fp's -- if that happens, the spam
cutoff should go back up at least partway.
> At present there are 18659 good messages and 1569 spam messages in the
> respective wordlists.
The spam message count is a bit light. I wouldn't recommend trying to
optimize bogofilter's parameters (except spam cutoff) till you have at
least 5000 spams. Also, it might be wise to stop adding nonspams for a
while; we don't have much experience with bogofilter's performance with
extremely lopsided training databases, and theoretically it's better to
keep them more even (one list 2-3 times the size of the other wouldn't
worry me, but an order of magnitude is quite a difference).
Hope that helps............
--
| G r e g L o u i s | gpg public key: finger |
| http://www.bgl.nu/~glouis | glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
More information about the Bogofilter
mailing list