Training without ham.

Jef Poskanzer jef at acme.com
Mon Sep 8 02:52:48 CEST 2003


>I'm wondering what experiences people have with using bogofilter for site wide
>spam filtering. Since spam is generally the same for all accounts, while ham
>can differ widely (between nationalities for instance), is it viable to set up
>a bogofilter that only uses a spam corpus provided by some of the site
>administrators?

I think you'd get better results trying to train it on the *union* of
all your sites ham streams, rather than on no ham.  I started to do
this on acme, where there are basically only two users - me and my sister.
I took the wordlist I had already trained for myself and then added
in, as ham, all my sister's saved email.  Since I didn't actually
look at the mail, her privacy was preserved.  The resulting filter
suffered a drop in efficiency for a few days, letting a bunch of
spam through to me, but after some more training it was ok again.

On a larger system, rather than grubbing through people's directories
looking for email, I think what I'd do is register everyone's *outgoing*
email as ham.  That ought to be close enough to their incoming mail
for the filter to work acceptably.
---
Jef

         Jef Poskanzer  jef at acme.com  http://www.acme.com/jef/




More information about the Bogofilter mailing list