[PATCH] combined wordlist a.k.a. single list
Jeremy Blosser
jblosser-bogofilter at firinn.org
Thu Jun 5 00:42:47 CEST 2003
On Jun 04, Malcolm Dew-Jones [yf110 at victoria.tc.ca] wrote:
> On Tue, 3 Jun 2003, Jeremy Blosser wrote:
> >
> > Our goodlist is a pretty important resource for us. It took a lot of time
> > and effort to create the initial lists, and has taken even more time to
> > refine them with user feedback to something we can trust to filter all of
> > our mail, especially in a large heterogenous environment like ours. On my
> > personal accounts at home I keep all the spam and nonspam I receive so I
> > can do wordlist rebuilds as I need them, but it'd be foolish to try that
> > here due to the volume of mail we see, privacy concerns about storing the
> > nonspam notwithstanding. We can't just recreate our existing goodlist from
> > mail we have stored somewhere for that purpose. We need to keep several
> > levels of backups of the goodlist, because it'd be hard to replace if we
> > somehow lost it, and our ability to block only spam (and never good mail)
> > is pretty tied to it.
> >
>
> $0.02
>
> We use the mail that our users _send_ as the primary source for our
> legitimate mail sample.
Yeah, we use this as well as one source.
> By definition, what they send is not spam.
Heh. I wish I could say the same, but we do have a large marketing
department. ;-)
> The assumption, which for the most part appears to be true, is that any
> mail they receive that is similar to what they send is indeed legitimate,
> job related communication.
Yes, but unfortunately for us it's not representative of the entire legit
incoming mail body. We receive a lot of legit stuff for which we don't have a
cooresponding outgoing body.
More information about the bogofilter-dev
mailing list