bogofilter-0.95.1 - New Current Release
David Relson
relson at osagesoftware.com
Mon Jun 27 03:34:42 CEST 2005
Greetings,
This release fixes some problems with the unicode support in 0.95.0.
Most notably RFC2047 encoded tokens are converted to unicode (when
appropriate) and bogotune problems with unicode/raw wordlists have been
corrected.
Enjoy!
David
########################################################################
Files are available at http://sourceforge.net/projects/bogofilter for
download.
Here are the md5sums for the release:
6294cb374a0ed4fd467fd8131ed0c3e6 bogofilter-0.95.1-1.i586.rpm
b371d40b7f5e76d1f5738128ef296c74 bogofilter-0.95.1-1.src.rpm
362cecc36d9e82e8fd62496cb7231faf bogofilter-0.95.1.tar.bz2
f58fa919ad9195ef9d8443c46e7b532d bogofilter-0.95.1.tar.gz
4f7123851091c533538b4375585f6335 bogofilter-static-0.95.1-1.i586.rpm
########################################################################
Read the NEWS file for a more detailed list of changes since 0.95.0.
Below are the combined RELEASE.NOTES for the unicode releases - 0.95.x
### 0.95.1 ###
Some of the 'make check' scripts use '--unicode=no' and/or
'--unicode=yes' options. These scripts will fail if bogofilter is
configured in a non-default manner, i.e. if --disable-unicode or
--enable-unicode is specified.
### 0.95.0 ###
This release supports unicode (utf-8). A new meta-token .ENCODING has
been added to the wordlist so that bogofilter can determine if it's
using unicode or not. A value of 1 indicates raw storage and 2
indicates utf-8 encoded tokens. Bogofilter checks for this meta-token
and converts incoming text to utf-8 as appropriate.
Command line options "--unicode=yes" and "--unicode=no" can be used.
° With bogofilter, they control encoding of newly created
databases.
° With bogoutil, during wordlist maintenance they change wordlist
to/from unicode.
° For bogolexer, they viewing parser results in new and
old modes
./configure options allow bogofilter customization.
° "./configure --unicode=yes" will _always_ operate in unicode mode
° "./configure --unicode=no" will _never_ operate in unicode mode
Wordlists can be converted from raw storage to unicode using:
bogoutil -d wordlist.db > wordlist.raw.txt
iconv -f iso-8859-1 -t utf-8 < wordlist.raw.txt > wordlist.utf8.txt
bogoutil -l wordlist.db.new < wordlist.utf8.txt
or:
bogoutil --unicode=yes -m wordlist.db
Wordlists can be converted from unicode to raw storage using:
bogoutil -d wordlist.db > wordlist.utf8.txt
iconv -f utf-8 -t iso-8859-1 < wordlist.utf8.txt > wordlist.raw.txt
bogoutil -l wordlist.db.new < wordlist.raw.txt
or:
bogoutil --unicode=no -m wordlist.db
For a wordlist based on a different charset, for example CP866 or
KOI8-R, use that charset in dump/convert/load command sequences above.
For a wordlist containing tokens from multiple languages, particularly
non-european languages, the conversion methods described above may not
work well for you. Building a new wordlist (from scratch) will likely
work better as the new wordlist will be based solely on unicode.
More information about the Bogofilter
mailing list