Training and -o

David Relson relson at
Sun Jul 18 00:22:12 CEST 2004

On Fri, 16 Jul 2004 19:29:06 -0800
Barsalou wrote:

> So I assume I would use the -o values during all use of bogofilter
> (hence putting it in the config file)
> Do folks want to share what values they are generally using so that we
> can come up with some sort of accepted standard?  Or does this create
> problems?
> What is the impact of not using the -o initially, then adding it
> later? If it makes more sense to use it from the start, then what
> would be good"starting values".  
> I am going to use the -l option then grep my logfile to come up with
> good -o values.  Let's see what happens.
> Mike
> -- 
> Barsalou <barjunk at>

Hi Mike,

Using -l to create a record is an excellent idea as it'll give you a
record of what bogofilter has done.  When you use your unsures to train
or correct misclassifications, you'll probably want to record that too.

I like to have my options in the config file (rather than the command
line), as I can keep a record of changes using RCS.  Actually,
bogofilter's command line scanner and config file scanner are closely
tied together.  All config file options can be included in the command
line.  For example "ham_cutoff=0.45" in the config file is option
"--ham_cutoff=0.45" on the command line.

Asking what other people use for their cutoffs is fine.  Keep in mind
that every mail site is different.  Some sites serve 1 user, some serve
5, some serve a company of 100 people, some are at large ISP's.  What's
best for you is probably somewhat different from anybody else.
Bogofilter's default values were determined empirically (from a mix of
messages and sites) so as to give good values for everyone.  The _best_
values for _you_ can only be determined by you.  (Note: if you have a
corpus of several thousand each of ham and spam, you can use bogotune to
find the parameters that do the best job with the given corpus.)



More information about the Bogofilter mailing list