Training from scratch.
barjunk at attglobal.net
Fri Jul 16 02:57:03 EDT 2004
After that beautiful explanation, I still think I am confused...I will
put into different words because I am not sure what some of the
terminology that you are using means.
What I think you said is that if I set my cutoff values such that a
spamicity value of .75 - 1 for spam and .0 - .4 for ham (as an example).
Then .4 - .75 would be where the "unsures" fall. I would generate these
values by looking at the bogosity values throughout all my mail and
looking at the highest value for ham and the lowest value when the
computer thinks it is spam. I could then use the -o command to execute
the above idea.
What would the command actually look like? Or if using the config file,
what would it look like?
How can I be sure that it is working like I expect it to?
> From: "mbox mbarsalou" <barjunk at attglobal.net>
> > I would like to add to this conversation by asking what about after the
> > training? How do I get it to start discarding spam that fall into the
> > unsure category?
> If you continue to train on errors (anything which is unsure or
> misclassified) using -s, -n, -S, and -N (see the man page), bogofilter will
> continue to learn and improve filtering. You can also adjust your cutoffs.
> This is what -o does, but you should probably put cutoffs in your config
> file instead of on the command itself. This way you'll be consistent.
> There's a description of cutoffs in the config file as well as the man page.
> If you find that you get lots of spams in your unsure range, you can
> consider lowering your spam_cutoff. Basically, your spam_cutoff should be
> just above your highest scoring ham, and your ham_cutoff should be just
> below your lowest scoring spam. That should keep only spam classified as
> spam, and only ham classified as ham, with a little of both in your unsures.
> Depending on your tolerance for false negatives or false positives, you may
> want to lower your spam_cutoff and raise your ham_cutoff beyond those
> determined by this simple rule. You'll get a feel for where your hams and
> spams score by looking at the X-bogosity line (in the email header) in any
> misclassified emails you get.
> You can try bfproxy (http://orderamidchaos.com/bogofilter/bfproxy) to easily
> do training-on-error via email.
More information about the Bogofilter