tri-state classification

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Thu Oct 28 17:10:54 CEST 2004


On Tue, 26 Oct 2004, David Relson wrote:

> RELEASE.NOTES-0.93 covers the subject.  I've included a copy at the end
> of this message.  It's also mentioned in the NEWS-0.9x and CHANGES-0.9x
> and will be prominently mentioned in the announcement.

Yes, sure. But we all know how much attention people pay to the
documentation. :)

> > In fact, one might add a new directive to the config file, say
> > "config_version", telling bogofilter what default values are assumed
> > by the user, [...]
>
> Can you expand on that idea?  My gut reaction is that any such check
> will make matters even more complicated and will be kludgy.  However, I
> could be wrong.

Example: the user has used version 1.2.3 with a configuration file
including a line reading "config_version 1.2.3".

A new version, let's call it 2.3.4, having some default values (or, God
forbid, the semantics of some configuration directives) different from
1.2.3 is released and installed. 2.3.4 sees "config_version 1.2.3", i.e.
it knows its behaviour might differ from what the user expects.

IMHO, there are 3 basic options (in the order of increasing implementation
complexity and decreasing user inconvenience):

1. print a warning unconditionally, and go on, (this option was not
   explicitly mentioned in the original proposal)

2. test whether the sematics of the *given* configuration file,
   taking its *actual* settings into account, would change, print a
   warning if necessary, and go on,

3. try emulating 1.2.3 (and perhaps print a warning as well).

I agree with Matthias that option #1 is sufficient (now). Option #3 might
be worth considering if such a backward incompatible changes has to be
made between two stable post-1.0 version (for whatever reason). Option #2
would be somewhat easier to implement than #3 (AFAICT) but the reduction
of inconvenience compared to #1 would be minor (as long as backward
incompatible changes are rare as they should be), ergo #2 appears to be
pointless.


On Wed, 27 Oct 2004, Tom Anderson wrote:

> I also think that would be kludgy.  Bogofilter -Q, if I recall correctly, 
> outputs the config info.  Any utility scripts should be updated to get the 
> appropriate format from there.

1. The need to grok the output of -Q adds extra complexity. And
   there is -T (as pointed out by Matthias) that is probably the best
   solution whenever a machine readable output is needed (or not...
   see below).

2. It would not work for scripts supposed to process both historical
   and new data. E.g. a script processing a mail archive and comparing 
   archived scores and classification computed at the time the message
   arrived (recorded in X-Bogosity) to those computed with the current
   configuration and db.


On Thu, 28 Oct 2004, Matthias Andree wrote:

> We have an "invariant terse mode" for use in scripts, bogofilter -T.

Great! I guess I should read the documentation more carefully (see
above). :P  It might be a good idea to stress the existance of this
feature in the release notes so people fixing their scripts fix them the
right way.

Unfortunately, there is a small catch: -T prints the score using %g, and
one needs to do full parsing of the resulting text to make any sense out
of it (farewell, sed, I loved you...<g>). There is -TT printing the score
in a fixed format but it does not print the classification.

Hmmm...I could also run bogofilter with explicit --spamicity_tags,
--spamicity_formats et al. to get predictable output in a desired
format...

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."




More information about the Bogofilter mailing list