tri-state classification

David Relson relson at osagesoftware.com
Wed Oct 27 01:54:22 CEST 2004


Hi Pavel,

Your comments are appreciated!  Responses follow...


On Wed, 27 Oct 2004 01:14:06 +0200 (CEST)
Pavel Kankovsky wrote:

> On Sun, 24 Oct 2004, David Relson wrote:
> 
> > I propose that bogofilter's default configuration be changed to use
> > tri-state classification with a conservative ham cutoff of 0.4 and
> > with bogosity tags of "Spam", "Ham", and "Unsure".
> 
> This may be a nasty surprise for people relying on the current default
> config (for instance, I myself am already using tri-state
> classification but I'll have to either 1. set spamicity_tags
> explicitly, or 2. review all my utility scripts to make sure they can
> handle "Spam" instead of "Yes" etc.).

RELEASE.NOTES-0.93 covers the subject.  I've included a copy at the end
of this message.  It's also mentioned in the NEWS-0.9x and CHANGES-0.9x
and will be prominently mentioned in the announcement.

> Perhaps there should be some kind of transition period when bogofilter
> 
> would print a warning whenever it would use one of the changing
> default values.
> 
> In fact, one might add a new directive to the config file, say
> "config_version", telling bogofilter what default values are assumed
> by the user, and bogofilter would either 1. use the right default
> value, or 2. print a warning when the current value has changed since
> the version the config file is based on.

Can you expand on that idea?  My gut reaction is that any such check
will make matters even more complicated and will be kludgy.  However, I
could be wrong.

> Are you going to change the terse output (bogofilter -t) or leave it
> as it is now (Y/N/U)?

The terse output uses the first character of the spamicity_tags, so the
output will change from Y/N/U to S/H/U.

File format.c both tag sets, i.e. Yes/No/Unsure and Spam/Ham/Unsure, and
uses the one that spamicity_tags points to.  The actual statements are:

static FIELD spamicity_tags_ynu[RC_COUNT] = { "Yes",   "No",    "Unsure"
};
static FIELD spamicity_tags_shu[RC_COUNT] = { "Spam",  "Ham",   "Unsure"
};

FIELD  *spamicity_tags    = spamicity_tags_shu;

HTH,

David


--- begin RELEASE.NOTES-0.93 ---

Bogofilter's defaults have been changed.  It now operates in tri-state
mode and will classify messages as Spam, Ham, or Unsure.

If you're checking messages for "X-Bogosity: Yes" or "X-Bogosity: No",
you _need_ to change your checks.  Use "X-Bogosity: Spam" and
"X-Bogosity: Ham" instead of the old forms.  Also, checking for
"X-Bogosity: Unsure" and putting those messages in a separate folder
(or mailbox) will give you an excellent set of messages for training,
as "Unsure" messages are messages that bogofilter has too little
information to classify (with certainty) as spam or ham.

--- end RELEASE.NOTES-0.93 ---




More information about the Bogofilter mailing list