Code Clean-Up - Phase 1

David Relson relson at osagesoftware.com
Thu Jan 1 19:25:45 CET 2004


Greetings,

The process of cleaning bogofilter's code base has begun.  Briefly
stated, version 0.16.0 is Phase 1, which will deactivate code that
exists for compatibility with older versions of bogotilter, and version
0.17.0 is Phase 2, which will remove that code.  The goal is to release
version 1.0 after 0.17.X reaches a stable state.

Below is a copy of RELEASE.NOTES-0.16.

David


		       Code Clean-Up - Phase 1
		       -----------------------

                             Introduction
                             ------------

Bogofilter was released over a year ago and has continually been
extended, corrected, enhanced, and refined.  Over this time it has
evolved from a simple Bayesian filter to a sophisticated filter that
understands email, decodes text parts of multi-part MIME messages,
processes html, etc.

During this evolution, old functions have remained in the code and
command-line options have been added to provide compatibility with
older versions.  Many of these functions and options have started
collecting dust - some are not commonly used and others are not
well-tested.

Bogofilter is suffering from creeping featuritis and optionitis.

                      It is time to clean house!

The goal of the bogofilter 0.16 series is to clean out this excess
code and create a core of high quality code. This will necessarily cut
some ties with previous versions, and you may need to adjust your
wrapper scripts to make up for features we have dropped.

The following list is supposed to be complete.  Let us know if we've
omitted anything. We shall try to provide workarounds and migration
paths whenever possible.

                             Feature List
                             ------------

1) Scoring algorithms:

    Bogofilter will support only the Robinson-Fisher algorithm,
    commonly called the "Fisher algorithm".  The Graham algorithm and
    Robinson geometric-mean algorithm, a.k.a. Robinson algorithm, have
    been deprecated.

2) Wordlist support.

   Bogofilter will now support only the combined wordlist, i.e.
   wordlist.db, which contains both the ham and spam counts for each
   token.  The older, separate wordlists (spamlist.db and goodlist.db)
   are no longer supported.  

   The bogoupgrade program can still be used to merge the separate
   databases for you.  Type "bogoupgrade -d /you/wordlist/directory/"
   to do the job.

   Ignore lists, i.e. ignorelist.db, are also being deprecated.  The
   ignore list feature has never been thoroughly tested and is not
   used (as far as we know).

3) BerkeleyDB support

   Binary RPM packages are now being built with BerkeleyDB-4.1 (or
   newer).

   For convenience, use whatever BerkeleyDB version came with your
   system.  We have tested BerkeleyDB 3.2 and newer, but our testing
   focus is with the recent 4.X releases.  We developers are no longer
   using BerkeleyDB-3.3, but will leave the code in bogofilter to
   allow its continued use.

4) Command line switches:

   Bogofilter will no longer support the switches listed in this
   section.  If used, bogofilter will print an error message and exit.

   Scoring related switches:

        -g - select Graham algorithm
        -r - select Robinson Geometric-Mean algorithm
        -f - select Robinson-Fisher algorithm
        -2 - set binary classification mode
        -3 - set ternary classification mode

        Note:  The Robinson-Fisher algorithm is bogofilter's one and
        only algorithm.  The classification mode switches are
        unnecessary.  Bogofilter will use binary mode if ham_cutoff is
        zero and will use ternary mode (Yes, No, Unsure) if ham_cutoff
        in non-zero and less than spam_cutoff.

   Wordlist switches:

        -W   - use combined wordlist  for spam and ham tokens
        -WW  - use separate wordlists for spam and ham tokens

        Note:  Combined mode is now the only supported mode.

   Backwards compatible token generation switches:

        -Pi and -PI - ignore_case
        -Pt and -PT - tokenize_html_tags
        -Pc and -PC - strict_check
        -Pd and -PD - degen_enabled
        -Pf and -PF - first_match

        Note: Since last May, the default values for these switches
        have been:

            ignore_case         disabled
            tokenize_html_tags  enabled
            strict_check        disabled
            degen_enabled       disabled
            first_match         disabled
            
        There will be no change in the default values.

5) Configuration options:

   The following configuration options (for the above switches) are
   deprecated:

        algorithm

        wordlist
        wordlist_mode

        ignore_case
        tokenize_html_tags
        tokenize_html_script
        header_degen
        degen_enabled
        first_match

    Note:  Bogofilter will print an warning message if it sees any of
    these options, but will run fine anyhow.

6) Miscellany:

   The user formatted SPAM_HEADER will no longer support format
   specification "%a" (for algorithm) since bogofilter now has only
   one algorithm.

                           Operational Note
                           ----------------

With the 0.16.0 release, a number of features have been deprecated.
The relevant code is bracketed by "#ifdef ENABLE_DEPRECATED_CODE" and
"#endif" statements.  The default build will not include the
deprecated features.  For those who still need these features,
configure option "--enable-deprecated-code" exists to allow them to be
turned on.

				 Plan
				 ----

Bogofilter 0.16.0 will be the "Code Clean-Up - Phase 1" release.  The
"deprecated" state will exist until 0.16.X is promoted to "stable"
status, or for a month, whichever is longer.

Bogofilter 0.17.0 will be the "Code Clean-Up - Phase 2" release.  All
the
deprecated code will be removed.




More information about the Bogofilter mailing list