bogofilter 0.93.4 - new current release
relson at osagesoftware.com
Sun Jan 9 10:39:19 EST 2005
The 0.93.4 includes additional tweaks for supporting Berkeley DB
transactions, new options for ./configure for russian language support
(specifically "--enable-russian" and "--default-charset=charsetname"),
and a common "long options" file shared by bogofilter, bogolexer, and
bogoutil. The usual number of small changes is also included. Read
below for further details.
Files are available at http://sourceforge.net/projects/bogofilter for
Here are the md5sums for the release:
NOTE: More information on important changes for bogofilter updaters
is in the RELEASE.NOTES files. Read them!!
RELEASE.NOTES has two important sections entitled:
INCOMPATIBLE CHANGES IN BOGOFILTER 0.93
and MAJOR CHANGES IN BOGOFILTER 0.93
** Bogofilter is now using Berkeley DB's Transaction
capability to ensure database integrity.
** Bogofilter is now generating tri-state results labeled
Spam, Ham, and Unsure, compared to the old two-state Yes/No
!!!!!!!! READ THE RELEASE.NOTES !!!!!!!!
* Misc fixing of compiler warning messages.
* Minor refactoring of charset code.
* Fix --enable-transactions logic, was inverted since 2004-12-26
(affected release: 0.93.3.1 only, 0.93.3 was fine). Note that
giving neither --enable-transactions nor --disable-transactions woudl
still enable transactions.
* bogoutil now reads the configuration files to know the user's
db_lk_max_* values. Reported by Karl O. Pinc.
* Added '--config-file=name' to bogoutil to name a config file and
'-C' option to suppress reading a config file.
* Berkeley DB Transactional recovery now uses the actual db_lk_max_*
values rather than hardcoded 1024. Reported by Karl O. Pinc.
* Added '--default-charset=name' option to configure script.
* Initial support for Russian character sets (thanks to Evgeny
* Bogolexer man page: replaced incorrect references to
'bogofilter' with 'bogolexer'.
* Added '-O file' to specify bogolexer output file.
* Fixed bogoutil --db-remove-environment DIR, which would just abort.
* Fixed several memory leaks.
* Refactored long option code. All definitions are now in one
file (longoptions.c). Programs bogolexer and bogoutil
ignore options that don't apply to them, rather than abort.
* Long options used in the config file use underscores in
their names. Used on the command line, they have hyphens.
This fixes a problem where some options had hyphens and some
* Fixed errors in bogoutil's usage and help messages.
* Bogoutil's options for maintaining the database environment
are all long options with a "db-" prefix.
* Bogoutil's help message and man page include the new long
* Early Christmas Gift: Bogofilter now supports SQLite v3.
Requires SQLite v3.0.8. See the RELEASE.NOTES.
* Internal cleanup: Move transaction handling back into database space,
and let the database backend driver map this into the environment if
* Portability fix for BerkeleyDB versions 3.1 and 3.2:
log_archive expects a fourth argument.
* lexer_v3 HTML parser fix for urlencoded characters, by Krzysztof
Foltman. Speeds up a particular case of malformatted mail.
* bogoutil -C file now checks if the database file file is intact.
(Only implemented for Berkeley DB stores with and without
* bf_compact now uses db_archive without -d option and loops on the
results instead, calling rm in turn for each file. -d is not
supported by older Berkeley DB versions such as 4.0.
* bogoutil -P directory now checkpoints the database and removes
inactive log files. Note you must save the database and remaining log
files, in that order, if you want to be able to recover from
* Limit mime overflow error messages to 1 per email.
* configure now checks if Berkeley DB supports shared environments and
suggests workarounds if it doesn't, to aid Fedora Core users.
* New directory doc/programmer/OS2 contains configure.os2
script contributed by Yuri Dario
* New script bf_resize DIR that checks the sizes of all databases in an
environment and writes a lock size to DB_CONFIG.
* Accuracy fix: message counts of ignore lists (that can be present)
will be ignored and no longer skew the spamicity.
* Allow environment to be group writable, reported by Fletcher Mattox.
* Accuracy fix: no longer pretend that we had seen an empty message
registered when there was no registration. Use ROBX for spamicity.
This changes the output format of bogofilter -vvv mode when no spam
or no ham messages have been registered previously.
* Support for Berkeley DB 3.0 was explicitly removed again, so that no
stable bogofilter version since 0.17.5 will have had support for this
version. This eliminates the need for on-disk database format
upgrades and keeps things simple.
As the unadvertised breaking of BDB 3.0 didn't raise a single
complaint and 3.1 has been around since July 2000, this should be
* Support long options in bogoutil.
* Add --remove-environment DIR long option to bogoutil, to remove the
environment. Only one such option can be used and there is no
corresponding short option.
* Remove useless numeric Berkeley DB error codes from error messages.
* bogofilter processes will refuse to open multiple wordlists in
different database environments (directories) when the transactional
Berkeley DB datastore is compiled (default). The non-transactional
(--disable-transactions), QDBM and TDB datastores are unaffected.
* bogotune now uses getopt() to process the argument list,
hence requires a '-n' flag before each non-spam file and a
'-s' flag before each spam file.
* bogotune now accepts '-x flags' to set debug flags.
* Make scoring one huge transaction, rather than one individual
transaction per token. This fixes consistency and should improve
WARNING: this seems to have broken bogotune, which, BTW, doesn't
return errors to the test suite (t.bulkmode, with message-count
files), it reports a bogus "PASS" in spite of database PANICs.
* Restored the old traditional Berkeley DB datastore that cannot be
recovered. Its use is discouraged, to use this, type
* Restored the error message when recovery is attempted on QDBM
databases, was lost in the DEPOT (hash) ->VILLA (B+tree) switch.
* Added utility script bf_tar.
* Added utility scripts bf_copy and bf_compact.
* Added BerkeleyDB warning for binary rpm users.
* New entries in bogofilter-faq.html on error messages
"Lock table is out of available locks" and
"Lock table is out of available object entries"
* Add %u formatting option to print login or user ID information,
SourceForge Feature Request #1056729.
* The README.db file now has information on the DB_CONFIG file that
can be created and used to configure the Berkeley DB module.
* Bogofilter's config file now supports setting max lock and
object counts for Berkeley DB using options
* Bogofilter and bogoutil now allow these options on the
command line, as:
* When running database recovery automatically, don't let go of the
lockfile, so we can do our actual work subsequently.
* Support for BerkeleyDB 4.3 was added. We'll avoid DB_NOSYNC on
DB->close() when DB_LOG_INMEMORY is configured for now.
* Update manual pages/example outputs and filter recipe examples from
"X-Bogosity: yes" to "X-Bogosity: Spam". Fixes Debian bug #280557.
* Bugfix for BerkeleyDB 4.2 support: check the data base flags, not the
environment flags, for DB_TXN_NOT_DURABLE, when determining whether
DB_NOSYNC is safe on DB->close(). May fix some kinds of database
corruption encountered with DB_TXN_NOT_DURABLE.
* Return DB_VERSION_STRING contents in -V (version) output when
compiled against Berkeley DB. Minor change to the output format.
* Unify and clean up the horrible RELEASE.NOTES-*, CHANGES* and NEWS-*
mess with lots of duplicated info.
There shall only be one RELEASE.NOTES file and one NEWS file.
RELEASE.NOTES shall contain important information for updates.
NEWS shall contain noteworthy code changes in technical detail.
This also removes the confusion that RELEASE.NOTES didn't contain
information relevant for 0.93.X.
* Berkeley DB mode: do not create data base in read mode (properly map
open_mode to DB_RDONLY flag, store open_mode).
* Berkeley DB mode: exit with error code if lock file cannot be
created. Attempt recovery even if creation of lock file succeeded.
* Fixed negative buffer index in mime.c
0.93.0 2004-11-06 "Broken compatibility" release
* Fix bogotune's '-D' option.
* Use only reentrant functions in the signal handler that runs
periodically to check for crashed processes.
Reported by Pavel Kankovsky.
* Add a debugged and enhanced version of Stefan Bellon's QDBM
* Broke QDBM compatibility with 2004-10-30 change, check unsigned
characters to match Berkeley DB behavior of bogoutil -d.
* Rearranged flag setting for Berkeley DB data store, so as only to set
DB_CHKSUM[_SHA1] when creating the data base.
Fixes "checksum error: catastrophic recovery required" and
consequential "wordlist.db: page 1: reference count overflow" errors
Reported by Torsten Veller.
* Revised RELEASE.NOTES-0.93 to move QDBM change into "Incompatible
Changes" section and to mention BerkeleyDB dump/load for 4.1 and 4.2
to add checksums.
* Inserted new section 2.2 into doc/README.db to mention that it is
recommended to dump/load the data base when using BerkeleyDB 4.1 and
* Converted QDBM from hash files (DEPOT API) to B+ trees
(Villa API) for better speed (Stefan Bellon).
* Attempting recovery with TDB or QDBM data bases results in an error,
so the user does not think it succeeded.
* Document that recovery only works for Berkeley DB, but not TDB or
* Merged Transactional branch (for BerkeleyDB) back into the trunk.
Further changes below.
* Added GETTING.STARTED document.
* Changed default mode from two-state to three-state
- with ham_cutoff=0.45 and spam_cutoff=0.99
The ham_cutoff value is new and spam_cutoff is unchanged.
- changed the "Yes/No" tags used in the "X-Bogosity:" line
NOTE: the next entries appear to be out of order, the pertinent changes
have been developed on a side branch of bogofilter and have been merged
for bogofilter 0.93.0.
* bogofilter can now be used with Berkeley DB 3.0 or 3.1 although this
is not recommended. You should prefer 4.2 or 4.1 instead.
UPDATE: support for 3.0 was later removed on 2004-11-29
* Documentation on the write cache issue (recoverability of data bases)
has been revised.
* Updates doc/README.db with a section on the log file size and
pointers to db_checkpoint and db_archive.
2004-09-03 (txn 2.1)
* The on-line crash detector would consider its own process a zombie,
so all processes that lasted 30 s or longer would abort themselves
after that period.
This was particularly prominent with BerkeleyDB 4.1 with
x86/gcc-assembly mutexes as this combination appears rather slow when
facing lock contention, causing t.lock3 failure. BDB 4.1 compiled to
use POSIX mutexes (where working) appears to be a lot faster in this
2004-09-01 (txn 2.0)
* Hook up crash detection code. Bogofilter is now able to detect
when recovery is necessary and should detect stalled data bases
within 30 seconds.
NOTE: this means if one process crashes all other processes
accessing the same data base will abort with an error code.
Stalled data bases happen when one process or the system crashes and
doesn't have a chance to clear its locks.
This code uses ideas from Matthias Andree and Pavel Kankovsky.
2004-08-23 (txn 1.1)
* Add -f and -F options to bogoutil (mnemonic: fix) to run data base
* Reimplement our own locking so that recovery and data base access
don't collide and no two processes try running recovery at the same
0.92.8 2004-10-25 - Promoted to Stable Release
More information about the Bogofilter-announce