bogofilter - feedback from client to mail server

Greg Louis glouis at dynamicro.on.ca
Sun Feb 23 18:12:56 CET 2003


On 20030223 (Sun) at 1150:15 -0500, Kevin Coyner wrote:

> This seems to be working well as it does feed my spamlist.db, and
> theoretically, all users who get their email filtered through bogofilter
> on this email server will benefit from that spamlist.db.  Afterall, I
> would think spam is not really unique to particular users.
> 
> That said, good email is very unique to specific users, so I'm wondering
> just how effective bogofilter will be for a group of users, none of
> which are continually feeding it 'good emails' to build the goodlist.db?

Not very.  I'm running bogofilter for about 80 users at work; what I do
is collect copies of all the mail (about 1500 messages a day of which
400 or so are spam) and periodically run bogofilter to separate the
whole collection into three mailboxes: good, bad and unsure.  A very
quick review of good and bad usually suffices to confirm that no
changes are needed there; then I manually reclassify unsures into spam
and nonspam and train on those.  If any errors do turn up in good or
bad, of course they are included in the training.  This seems to be
working fairly well for me, at the cost of maybe 20 minutes' work twice
a week.

Running for myself alone I'm around 1% false negatives and hardly any
false positives; at work the best I can do is around 6% false
negatives, again with few false positives.  The problem is that
marketing and purchasing and sales people all have interests that sound
spammy at times.  Even our engineers subscribe to a bunch of
newsletters that are often hard to distinguish from spam.  So I have to
keep the thresholds low to keep false positives down, and of course
that lets more false negatives through.

> Is bogofilter really designed to by used by just one user?  Or will it
> do pretty good with a well feed spamlist.db but essentially a pretty
> thin goodlist.db?

It works best if you don't let the training db get too lopsided. 
Bayesian classification, on which bogofilter is based, works better
when the good email is limited in scope, so generally better for one
user than for many.  My users at work seem quite happy to be getting
about a fifth the amount of spam they used to, and even happier when I
point out that without filtering they'd be getting two to three times
as much as they used to.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the Bogofilter mailing list