[PATCH] combined wordlist a.k.a. single list

Jeremy Blosser jblosser-bogofilter at firinn.org
Thu Jun 5 00:42:47 CEST 2003


On Jun 04, Malcolm Dew-Jones [yf110 at victoria.tc.ca] wrote:
> On Tue, 3 Jun 2003, Jeremy Blosser wrote:
> > 
> > Our goodlist is a pretty important resource for us.  It took a lot of time
> > and effort to create the initial lists, and has taken even more time to
> > refine them with user feedback to something we can trust to filter all of
> > our mail, especially in a large heterogenous environment like ours.  On my
> > personal accounts at home I keep all the spam and nonspam I receive so I
> > can do wordlist rebuilds as I need them, but it'd be foolish to try that
> > here due to the volume of mail we see, privacy concerns about storing the
> > nonspam notwithstanding.  We can't just recreate our existing goodlist from
> > mail we have stored somewhere for that purpose.  We need to keep several
> > levels of backups of the goodlist, because it'd be hard to replace if we
> > somehow lost it, and our ability to block only spam (and never good mail)
> > is pretty tied to it.
> > 
> 
> $0.02
> 
> We use the mail that our users _send_ as the primary source for our
> legitimate mail sample.

Yeah, we use this as well as one source.

> By definition, what they send is not spam. 

Heh.  I wish I could say the same, but we do have a large marketing
department.  ;-)

> The assumption, which for the most part appears to be true, is that any
> mail they receive that is similar to what they send is indeed legitimate,
> job related communication. 

Yes, but unfortunately for us it's not representative of the entire legit
incoming mail body.  We receive a lot of legit stuff for which we don't have a
cooresponding outgoing body.




More information about the bogofilter-dev mailing list