Training from scratch.
tanderso at oac-design.com
Thu Jul 15 16:57:03 CEST 2004
From: "mbox mbarsalou" <barjunk at attglobal.net>
> I would like to add to this conversation by asking what about after the
> training? How do I get it to start discarding spam that fall into the
> unsure category?
If you continue to train on errors (anything which is unsure or
misclassified) using -s, -n, -S, and -N (see the man page), bogofilter will
continue to learn and improve filtering. You can also adjust your cutoffs.
This is what -o does, but you should probably put cutoffs in your config
file instead of on the command itself. This way you'll be consistent.
There's a description of cutoffs in the config file as well as the man page.
If you find that you get lots of spams in your unsure range, you can
consider lowering your spam_cutoff. Basically, your spam_cutoff should be
just above your highest scoring ham, and your ham_cutoff should be just
below your lowest scoring spam. That should keep only spam classified as
spam, and only ham classified as ham, with a little of both in your unsures.
Depending on your tolerance for false negatives or false positives, you may
want to lower your spam_cutoff and raise your ham_cutoff beyond those
determined by this simple rule. You'll get a feel for where your hams and
spams score by looking at the X-bogosity line (in the email header) in any
misclassified emails you get.
You can try bfproxy (http://orderamidchaos.com/bogofilter/bfproxy) to easily
do training-on-error via email.
More information about the Bogofilter