tri-state classification
Pavel Kankovsky
peak at argo.troja.mff.cuni.cz
Thu Oct 28 17:10:54 CEST 2004
On Tue, 26 Oct 2004, David Relson wrote:
> RELEASE.NOTES-0.93 covers the subject. I've included a copy at the end
> of this message. It's also mentioned in the NEWS-0.9x and CHANGES-0.9x
> and will be prominently mentioned in the announcement.
Yes, sure. But we all know how much attention people pay to the
documentation. :)
> > In fact, one might add a new directive to the config file, say
> > "config_version", telling bogofilter what default values are assumed
> > by the user, [...]
>
> Can you expand on that idea? My gut reaction is that any such check
> will make matters even more complicated and will be kludgy. However, I
> could be wrong.
Example: the user has used version 1.2.3 with a configuration file
including a line reading "config_version 1.2.3".
A new version, let's call it 2.3.4, having some default values (or, God
forbid, the semantics of some configuration directives) different from
1.2.3 is released and installed. 2.3.4 sees "config_version 1.2.3", i.e.
it knows its behaviour might differ from what the user expects.
IMHO, there are 3 basic options (in the order of increasing implementation
complexity and decreasing user inconvenience):
1. print a warning unconditionally, and go on, (this option was not
explicitly mentioned in the original proposal)
2. test whether the sematics of the *given* configuration file,
taking its *actual* settings into account, would change, print a
warning if necessary, and go on,
3. try emulating 1.2.3 (and perhaps print a warning as well).
I agree with Matthias that option #1 is sufficient (now). Option #3 might
be worth considering if such a backward incompatible changes has to be
made between two stable post-1.0 version (for whatever reason). Option #2
would be somewhat easier to implement than #3 (AFAICT) but the reduction
of inconvenience compared to #1 would be minor (as long as backward
incompatible changes are rare as they should be), ergo #2 appears to be
pointless.
On Wed, 27 Oct 2004, Tom Anderson wrote:
> I also think that would be kludgy. Bogofilter -Q, if I recall correctly,
> outputs the config info. Any utility scripts should be updated to get the
> appropriate format from there.
1. The need to grok the output of -Q adds extra complexity. And
there is -T (as pointed out by Matthias) that is probably the best
solution whenever a machine readable output is needed (or not...
see below).
2. It would not work for scripts supposed to process both historical
and new data. E.g. a script processing a mail archive and comparing
archived scores and classification computed at the time the message
arrived (recorded in X-Bogosity) to those computed with the current
configuration and db.
On Thu, 28 Oct 2004, Matthias Andree wrote:
> We have an "invariant terse mode" for use in scripts, bogofilter -T.
Great! I guess I should read the documentation more carefully (see
above). :P It might be a good idea to stress the existance of this
feature in the release notes so people fixing their scripts fix them the
right way.
Unfortunately, there is a small catch: -T prints the score using %g, and
one needs to do full parsing of the resulting text to make any sense out
of it (farewell, sed, I loved you...<g>). There is -TT printing the score
in a fixed format but it does not print the classification.
Hmmm...I could also run bogofilter with explicit --spamicity_tags,
--spamicity_formats et al. to get predictable output in a desired
format...
--Pavel Kankovsky aka Peak [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."
More information about the Bogofilter
mailing list