garbage removal and 'outsiders noise'

Greg Louis glouis at dynamicro.on.ca
Fri Apr 18 01:43:55 CEST 2003


On 20030416 (Wed) at 1638:58 -0500, Shawn Barnhart wrote:
> 
> ----- Original Message -----
> From: "Jim Correia" <jim.correia at pobox.com>
> To: <bogofilter at aotto.com>
> 
> > For people running in -u mode, do you run all of your mail through
> > bogofilter, or do you sift out lists and other whitelist candidates yet?
> 
> bogofilter first.
> 
> Bogofilter first.
> 
> I used to have to whitelist when I used SpamAssassin because it was too
> aggressive with false positives.
> 
> I haven't had a single fp with bogofilter yet and a fairly good fn rate (10%
> or so), and those I rescore.
> 
To me, 10% fn would be a big number.

I don't run with -u, but train manually: copy all mail to a single mbox
file, and periodically use bogofilter to break it in 3: spam, nonspam,
unsure.  These 3 I review manually, producing sptrain and nstrain (spam
and nonspam) files consisting of all unsures (which were delivered)
plus any (rare) classification errors.  I then train with the sptrain
and nstrain files.

This works well for me: I now get essentially no fp (one or two
mailing-list messages in two months) and around 1% fn.  At work, where
I have about 80 users, this same technique gives me about 0.05% fp
(roughly 3 a week, which I deliver when I find them) and about 2% fn.

Of course, this level of performance required tuning bogofilter's s and
min_dev parameters with local corpora of messages.  I could raise the
spam-cutoff threshold and cut out the fp, at the cost of about doubling
the fn, but so far it's been unnecessary to do that.  Experience shows
that the few fp are generally personal mail, which people don't mind
getting a few hours -- or a day -- late, once the situation is
explained.  As bogofilter and its training improve, the fp rate has
been falling, so I'm hopeful that this inconvenience is temporary.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list