bogofilter-0.93.2 - New Current Release
relson at osagesoftware.com
Fri Dec 3 19:25:52 EST 2004
The 0.93.2 release of bogofilter brings with it a variety of bug fixes,
usability enhancements, and documentation updates. Like the 0.93.1
release, these changes mostly pertaining to bogofilter's use of Berkeley
DB's Transaction capability.
Be sure to read file RELEASE.NOTES, available on SourceForge:
Here's the summary of the two major changes that comprised the heart of
the previous release - bogofilter-0.93.0:
1) Bogofilter now uses BerkeleyDB's transactional capability to ensure
database integrity. Berkeley DB uses additional files in the wordlist
directory to keep state and logging information. See file doc/README.db
for important info.
2) Bogofilter now defaults to tri-state configuration using cutoff
values of 0.45 and 0.99 (for ham_cutoff and spam_cutoff, respectively).
The ham_cutoff value is new and spam_cutoff is unchanged. With these
cutoffs, messages with scores between 0.45 and 0.99 are unsures.
In tri-state mode messages are scored as "Spam", "Ham", or "Unsure"
rather than just "Yes" or "No". This affects the "X-Bogosity:" line and
you may need to change scripts in procmail, maildrop, etc and filters in
Files are available at http://sourceforge.net/projects/bogofilter for
Here are the md5sums for the release:
NOTE: More information on important changes for bogofilter updaters
is in the RELEASE.NOTES files. Read them!!
RELEASE.NOTES has two important sections entitled:
INCOMPATIBLE CHANGES IN BOGOFILTER 0.93
and MAJOR CHANGES IN BOGOFILTER 0.93
** Bogofilter is now using Berkeley DB's Transaction
capability to ensure database integrity.
** Bogofilter is now generating tri-state results labeled
Spam, Ham, and Unsure, compared to the old two-state Yes/No
!!!!!!!! READ THE RELEASE.NOTES !!!!!!!!
* New script bf_resize DIR that checks the sizes of all databases in an
environment and writes a lock size to DB_CONFIG.
* Accuracy fix: message counts of ignore lists (that can be present)
will be ignored and no longer skew the spamicity.
* Allow environment to be group writable, reported by Fletcher Mattox.
* Accuracy fix: no longer pretend that we had seen an empty message
registered when there was no registration. Use ROBX for spamicity.
This changes the output format of bogofilter -vvv mode when no spam
or no ham messages have been registered previously.
* Support for Berkeley DB 3.0 was explicitly removed again, so that no
stable bogofilter version since 0.17.5 will have had support for this
version. This eliminates the need for on-disk database format
upgrades and keeps things simple.
As the unadvertised breaking of BDB 3.0 didn't raise a single
complaint and 3.1 has been around since July 2000, this should be
* Support long options in bogoutil.
* Add --remove-environment DIR long option to bogoutil, to remove the
environment. Only one such option can be used and there is no
corresponding short option.
* Remove useless numeric Berkeley DB error codes from error messages.
* bogofilter processes will refuse to open multiple wordlists in
different database environments (directories) when the transactional
Berkeley DB datastore is compiled (default). The non-transactional
(--disable-transactions), QDBM and TDB datastores are unaffected.
* bogotune now uses getopt() to process the argument list,
hence requires a '-n' flag before each non-spam file and a
'-s' flag before each spam file.
* bogotune now accepts '-x flags' to set debug flags.
* Make scoring one huge transaction, rather than one individual
transaction per token. This fixes consistency and should improve
WARNING: this seems to have broken bogotune, which, BTW, doesn't
return errors to the test suite (t.bulkmode, with message-count
files), it reports a bogus "PASS" in spite of database PANICs.
* Restored the old traditional Berkeley DB datastore that cannot be
recovered. Its use is discouraged, to use this, type
* Restored the error message when recovery is attempted on QDBM
databases, was lost in the DEPOT (hash) ->VILLA (B+tree) switch.
* Added utility script bf_tar.
* Added utility scripts bf_copy and bf_compact.
* Added BerkeleyDB warning for binary rpm users.
* New entries in bogofilter-faq.html on error messages
"Lock table is out of available locks" and
"Lock table is out of available object entries"
* Add %u formatting option to print login or user ID information,
SourceForge Feature Request #1056729.
* The README.db file now has information on the DB_CONFIG file that
can be created and used to configure the Berkeley DB module.
* Bogofilter's config file now supports setting max lock and
object counts for Berkeley DB using options
* Bogofilter and bogoutil now allow these options on the
command line, as:
* When running database recovery automatically, don't let go of the
lockfile, so we can do our actual work subsequently.
* Support for BerkeleyDB 4.3 was added. We'll avoid DB_NOSYNC on
DB->close() when DB_LOG_INMEMORY is configured for now.
* Update manual pages/example outputs and filter recipe examples from
"X-Bogosity: yes" to "X-Bogosity: Spam". Fixes Debian bug #280557.
* Bugfix for BerkeleyDB 4.2 support: check the data base flags, not the
environment flags, for DB_TXN_NOT_DURABLE, when determining whether
DB_NOSYNC is safe on DB->close(). May fix some kinds of database
corruption encountered with DB_TXN_NOT_DURABLE.
* Return DB_VERSION_STRING contents in -V (version) output when
compiled against Berkeley DB. Minor change to the output format.
* Unify and clean up the horrible RELEASE.NOTES-*, CHANGES* and NEWS-*
mess with lots of duplicated info.
There shall only be one RELEASE.NOTES file and one NEWS file.
RELEASE.NOTES shall contain important information for updates.
NEWS shall contain noteworthy code changes in technical detail.
This also removes the confusion that RELEASE.NOTES didn't contain
information relevant for 0.93.X.
* Berkeley DB mode: do not create data base in read mode (properly map
open_mode to DB_RDONLY flag, store open_mode).
* Berkeley DB mode: exit with error code if lock file cannot be
created. Attempt recovery even if creation of lock file succeeded.
* Fixed negative buffer index in mime.c
0.93.0 2004-11-06 "Broken compatibility" release
* Fix bogotune's '-D' option.
* Use only reentrant functions in the signal handler that runs
periodically to check for crashed processes.
Reported by Pavel Kankovsky.
* Add a debugged and enhanced version of Stefan Bellon's QDBM
* Broke QDBM compatibility with 2004-10-30 change, check unsigned
characters to match Berkeley DB behavior of bogoutil -d.
* Rearranged flag setting for Berkeley DB data store, so as only to set
DB_CHKSUM[_SHA1] when creating the data base.
Fixes "checksum error: catastrophic recovery required" and
consequential "wordlist.db: page 1: reference count overflow" errors
Reported by Torsten Veller.
* Revised RELEASE.NOTES-0.93 to move QDBM change into "Incompatible
Changes" section and to mention BerkeleyDB dump/load for 4.1 and 4.2
to add checksums.
* Inserted new section 2.2 into doc/README.db to mention that it is
recommended to dump/load the data base when using BerkeleyDB 4.1 and
* Converted QDBM from hash files (DEPOT API) to B+ trees
(Villa API) for better speed (Stefan Bellon).
* Attempting recovery with TDB or QDBM data bases results in an error,
so the user does not think it succeeded.
* Document that recovery only works for Berkeley DB, but not TDB or
* Merged Transactional branch (for BerkeleyDB) back into the trunk.
Further changes below.
* Added GETTING.STARTED document.
* Changed default mode from two-state to three-state
- with ham_cutoff=0.45 and spam_cutoff=0.99
The ham_cutoff value is new and spam_cutoff is unchanged.
- changed the "Yes/No" tags used in the "X-Bogosity:" line
NOTE: the next entries appear to be out of order, the pertinent changes
have been developed on a side branch of bogofilter and have been merged
for bogofilter 0.93.0.
* bogofilter can now be used with Berkeley DB 3.0 or 3.1 although this
is not recommended. You should prefer 4.2 or 4.1 instead.
UPDATE: support for 3.0 was later removed on 2004-11-29
* Documentation on the write cache issue (recoverability of data bases)
has been revised.
* Updates doc/README.db with a section on the log file size and
pointers to db_checkpoint and db_archive.
2004-09-03 (txn 2.1)
* The on-line crash detector would consider its own process a zombie,
so all processes that lasted 30 s or longer would abort themselves
after that period.
This was particularly prominent with BerkeleyDB 4.1 with
x86/gcc-assembly mutexes as this combination appears rather slow when
facing lock contention, causing t.lock3 failure. BDB 4.1 compiled to
use POSIX mutexes (where working) appears to be a lot faster in this
2004-09-01 (txn 2.0)
* Hook up crash detection code. Bogofilter is now able to detect
when recovery is necessary and should detect stalled data bases
within 30 seconds.
NOTE: this means if one process crashes all other processes
accessing the same data base will abort with an error code.
Stalled data bases happen when one process or the system crashes and
doesn't have a chance to clear its locks.
This code uses ideas from Matthias Andree and Pavel Kankovsky.
2004-08-23 (txn 1.1)
* Add -f and -F options to bogoutil (mnemonic: fix) to run data base
* Reimplement our own locking so that recovery and data base access
don't collide and no two processes try running recovery at the same
More information about the Bogofilter-dev