Training without ham.
Jef Poskanzer
jef at acme.com
Mon Sep 8 02:52:48 CEST 2003
>I'm wondering what experiences people have with using bogofilter for site wide
>spam filtering. Since spam is generally the same for all accounts, while ham
>can differ widely (between nationalities for instance), is it viable to set up
>a bogofilter that only uses a spam corpus provided by some of the site
>administrators?
I think you'd get better results trying to train it on the *union* of
all your sites ham streams, rather than on no ham. I started to do
this on acme, where there are basically only two users - me and my sister.
I took the wordlist I had already trained for myself and then added
in, as ham, all my sister's saved email. Since I didn't actually
look at the mail, her privacy was preserved. The resulting filter
suffered a drop in efficiency for a few days, letting a bunch of
spam through to me, but after some more training it was ok again.
On a larger system, rather than grubbing through people's directories
looking for email, I think what I'd do is register everyone's *outgoing*
email as ham. That ought to be close enough to their incoming mail
for the filter to work acceptably.
---
Jef
Jef Poskanzer jef at acme.com http://www.acme.com/jef/
More information about the Bogofilter
mailing list