Training from scratch.
tanderso at oac-design.com
Fri Jul 16 14:00:14 CEST 2004
On Fri, 2004-07-16 at 02:57, mbox mbarsalou wrote:
> What I think you said is that if I set my cutoff values such that a
> spamicity value of .75 - 1 for spam and .0 - .4 for ham (as an example).
> Then .4 - .75 would be where the "unsures" fall. I would generate these
> values by looking at the bogosity values throughout all my mail and
> looking at the highest value for ham and the lowest value when the
> computer thinks it is spam. I could then use the -o command to execute
No, of all of your actual spams that arrive, the one that bogofilter
classifies with the lowest bogosity is where you should put your ham
cutoff if you don't want to receive any (or at least minimize) false
negatives (spam classified as ham). However, if your lowest scoring
spam is at 0.0, then clearly you'll need to have a higher cutoff than
that (for tri-state classifications), and you'll need to work on
The same with the spam cutoff... of all of your actual hams, the one
that scores the highest should be your spam cutoff in order to reduce
false positives (ham classified as spam).
Of course, these numbers will change over time as you continue to
train. I think most people end up lowering their spam cutoff to near
0.5 as bogofilter gets much better at determining what is ham. But ham
cutoffs are usually kept very low due to spams sometimes looking hammy
due to spammer tricks.
> What would the command actually look like? Or if using the config file,
> what would it look like?
Take a look in your config file, similar to the one linked above. It
looks something like:
## with Yes/No/Unsure output:
ham_cutoff = 0.45
Your main system-wide config should be in /etc/bogofilter.cf. At the
top of that file, there's a value called "user_config_file" which tells
you where individual user configs are located. The default is
"~/.bogofilter.cf", but you can change this. Each user on your system
may want to have different cutoffs, so you should set cutoffs in your
user config file.
> How can I be sure that it is working like I expect it to?
Note whether any emails are classified incorrectly outside of your
> > From: "mbox mbarsalou" <barjunk at attglobal.net>
> > > I would like to add to this conversation by asking what about after the
> > > training? How do I get it to start discarding spam that fall into the
> > > unsure category?
> > If you continue to train on errors (anything which is unsure or
> > misclassified) using -s, -n, -S, and -N (see the man page), bogofilter will
> > continue to learn and improve filtering. You can also adjust your cutoffs.
> > This is what -o does, but you should probably put cutoffs in your config
> > file instead of on the command itself. This way you'll be consistent.
> > There's a description of cutoffs in the config file as well as the man page.
> > If you find that you get lots of spams in your unsure range, you can
> > consider lowering your spam_cutoff. Basically, your spam_cutoff should be
> > just above your highest scoring ham, and your ham_cutoff should be just
> > below your lowest scoring spam. That should keep only spam classified as
> > spam, and only ham classified as ham, with a little of both in your unsures.
> > Depending on your tolerance for false negatives or false positives, you may
> > want to lower your spam_cutoff and raise your ham_cutoff beyond those
> > determined by this simple rule. You'll get a feel for where your hams and
> > spams score by looking at the X-bogosity line (in the email header) in any
> > misclassified emails you get.
> > You can try bfproxy (http://orderamidchaos.com/bogofilter/bfproxy) to easily
> > do training-on-error via email.
> > Tom
> Bogofilter mailing list
> Bogofilter at bogofilter.org
More information about the Bogofilter