option proliferation [was: [bogofilter-announce] Bogofilter-0.14.1 - New Current Release]

David Relson relson at osagesoftware.com
Fri Aug 1 17:19:30 CEST 2003


Matthias,

Bogofilter _does_ have many options.  Since it first appeared a year ago, 
it has changed and grown.  It has become much more capable and more flexible.

The initial implementation used the Graham algorithm.  When the Robinson-GM 
algorithm was added some people were pleased and switched over 
happily.  Others stayed with the Graham algorithm.  The decision to support 
both algorithms was made and resulted in configuration options (to lessen 
executable size for people who only wanted one algorithm) and command line 
switches (to specify the algorithm at run time).

The implementation of the Robinson-Fisher algorithm occurred after 
that.  Again bogofilter produced better scoring results.  Again people 
didn't want to change.  Again more options to allow selecting from 3 
algorithms and to allow two-state scoring (ham/spam) or three-state scoring 
(ham/spam/unsure).

More recently, Paul Graham's new article and his new ways of improving 
classification.  Our tests confirmed the clear value of case sensitivity, 
tagging selected header lines, and tagging selected html types.  My tests 
of his token degeneration algorithm indicate it is _not_ of value, but more 
tests should be run by others.

Details aside, the new capabilities were liked by some people and disliked 
by others.  To maintain usefulness to the largest number of people, 
additional options were added so people can use bogofilter the way they want.

Result:  bogofilter has a variety of ways of handling many different 
situations and it _does_ complicate the code.  We _could_ reduce the number 
of options by keeping the best features, i.e. hard-wiring some (many?) of 
the defaults and deleting the other code.  Making big changes like this 
would force many users to change their usage of bogofilter, something I 
don't think is necessary or worthwhile.

At 05:58 AM 8/1/03, Matthias Andree wrote:
>David Relson <relson at osagesoftware.com> writes:

...[snip]...

> > 0.14.1        2003-07-31
> >
> > * Implemented named exitcodes, with Unsure having its own value (2)
> >    and changing the value for error from 2 to 3.
>
>This isn't the right thing to do. We cannot change established exit
>codes at will. 2 was error, and 2 must remain "error". There is no
>reason why unsure cannot be "3". If the command line allows to rearrange
>exit code mappings, that's fine, but the default must remain what it has
>been over the past year. Tomorrow someone maps "unsure, maybe spam" and
>"unsure, maybe ham" to 3 and 4 and error moves from 2 over 3 to 5,
>breaking scripts and existing setups again. This was discussed on
>bogofilter-dev, and I haven't seen any counter evidence to the
>suggestion to use 3 for "unsure".

Whey I queried the list about this, everyone else liked the idea.  You were 
against it, but didn't offer an alternative (AFAICT).  We _could_ change it 
as you suggest.  Alternatively, if we need additional exit codes in the 
future we can make them higher, i.e. 4,  5, etc.

>Bogofilter suffers from a severe optionitis (proliferation of options),
>we have options to cater for any personal preference, whether there is a
>technical need or no.
>
>WE NEED NO OPTIONS TO CONFIGURE IF SOMEONE WANTS Y/N/? OR S/H/U OR 1/0/?.

'Tis true.  They aren't necessary.  They are niceties and _are_ used.

>All these options, particularly if changing established behaviour, make
>supporting the software difficult and prone to failure. This violates
>the simplest principle: keep it simple, stupid.

As we've learned more about processing email, we've learned that 
established behavior can be wrong.  Dropping the old way and supporting 
only the new way breaks more than it fixes, I do think.

>MOST of the code growth of the past months has gone into convenience
>options that aren't necessary. The core functionality except for the
>lexer has hardly changed, so the next directives (before 1.0) are
>throwing all that accumulated cruft out again, like removing half of the
>options. Please remember that ESR started this project, and his new book
>mentions some of the principles that we should adhere to.

I do agree that the core functionality, i.e. the Robinson-Fisher algorithm, 
has remained stable.

You're overlooking multiple speed enhancements, particularly the combined 
wordlist.  There's also been profiling work that has resulted in rewriting 
multiple code "hot spots".

>There is really no need for bogofilter to understand different mailbox
>formats, this is something a wrapper can do that we will ship.

True, it's not _necessary_.  It _is_ valuable to some of our big users.

>There is no need either for convenience changes to let the user
>configure exit codes, and we shouldn't offer support for too many ways
>to run bogofilter either. Bogofilter's duty is to evaluate spam, not to
>accomodate taste or integrate with everything. Software needs to be
>adapted when integrated into a system at large, this has always been the
>case and will be the case.
>
>We should refactor the code so that we have a libbogofilter that other
>software (bogofilter and bogoutil for a start, later a client-server
>model) can use, then everybody can write his own wrapper interface,
>possibly in Perl, to accomodate his needs. Such stuff does not belong
>into bogofilter itself.

It seems like we already have this in the build process.  libbogofilter.a 
is built and then linked into the various executables - bogofilter, 
bogolexer, and bogoutil.

Not having recently reviewed what functions are in which files, I suspect 
that the structure is less clean than it used to be.  Refactoring the code 
likely would be a good thing.

>We _NEED_ to cut the option count to half of what it is now. We'll have
>less code paths and easier debugging.

Suggest which options should be deleted and we can see what people think of 
the list.

If I were to take on the task, I'd choose to keep the options I use and 
discard the others.  I doubt that my selections would please too many others.

>Putting on the packager's hat: I will not ship or approve updates for
>bogofilter on FreeBSD unless this is solved.

Fair enough.  Please define "solved" so we know what target we're aiming for.

David






More information about the Bogofilter mailing list