best practices question
Jeremy Blosser
jblosser-bogofilter at firinn.org
Sat Sep 21 01:49:44 CEST 2002
On Sep 20, David Relson [relson at osagesoftware.com] wrote:
> Until I saw your posting about stress testing, the thought of adaptive use
> hadn't occurred to me.
Most of our spam blocking currently happens via Vipul's Razor. Once we've
tested bogofilter to our satisfaction, our implementation plan looks like:
- Continue to run everything through Vipul's, and use its opinion of a mail
to train bogofilter:
if (razor->is_spam(msg))
bogofilter -s
drop msg
else
bogofilter -h
deliver msg
- Once this has created a bogofilter db of sufficient size, ask bogofilter
its opinion as well, and if the two don't agree, store the headers:
bf = bogofilter->is_spam(msg)
if (r = razor->is_spam(msg))
bogofilter -s
drop msg
else
bogofilter -h
deliver msg
if (r != bf)
store_headers()
we'll then look at a sample of these and hopefully find that the
difference is because bogofilter is a lot smarter than Vipul's.
- Assuming that is indeed what we find, we'll switch to letting bogofilter
drive:
if (bogofilter->is_spam(msg))
bogofilter -s
drop msg
else
bogofilter -h
deliver msg
and tell our users to let us know of any misclassifications so we can
retrain. That'll be the hard part, and the one that'll only work if
these algorithms really are good enough to keep the misclassifications to
a low number.
Obviously the original paper on this theory spoke of training it per user,
but that's just not an option in an org like ours, where the users are
telling IT "we're paying you to deal with this spam so we don't have to".
Hopefully it'll work in this environment as well. Results so far are
positive; our spam is pretty heterogenous, same with our legit mail.
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list