bogofilter howto question

David Relson relson at osagesoftware.com
Tue Sep 23 18:52:55 CEST 2003


On Tue, 23 Sep 2003 09:35:49 -0700
p at dirac.org wrote:

> > I agree with you that beginning bogofilterers could benefit from a
> > statement to that effect, although the faq might be a better place
> > to put it than the tuning HOWTO that I wrote.  I hope you don't mind
> > my copying this reply to the bogofilter list
> 
> sure, no problem!
> 
> > and we'd be happy to have you join it, if you haven't already.
> 
> already done - the cofirm message arrived a minute ago.  :)
> 
> so from your email, if i only occasionally get tarballs or mime
> encoded image/sound files, these would *not* be good emails to train
> with.   is that about right?
> 
> pete

Welcome Pete,

When doing selective training (as compared to using everything you've
got), pick a representative sample.  It's good to seed bogofilter with
more (rather than less) so it has a larger wordlist (knowledge base) to
use in the spam vs. ham scoring.

Bogofilter ignores the innards of binary attachments on the grounds that
mining binary data for "words" doesn't provide useful info.  Skipping
large tarballs is fine.

My practice is to feed everything to bogofilter so it has as much info
as possible.  This works well for me (with a 5 user domain).  Other
sites have different mixes - from 1 user to 80 to the many users of a
large company or an ISP.

I expect you'll quickly find a level of training that enables bogofilter
to give you good results.

David




More information about the Bogofilter mailing list