bogofilter-0.95.0 - New Current Release
David Relson
relson at osagesoftware.com
Tue Jun 21 04:03:13 CEST 2005
Greetings,
This release provides unicode support for new and converted wordlists.
Existing wordlists will use iso-8859-1 (as before). Read the 0.95.0
release notes below for more details.
Enjoy!
David
########################################################################
Files are available at http://sourceforge.net/projects/bogofilter for
download.
Here are the md5sums for the release:
c41eb71bedfd5523b42dae6ae22979c8 bogofilter-0.95.0-1.i586.rpm
0569f9b13347879ead8d6ec977cdd25d bogofilter-0.95.0-1.src.rpm
d1caee67d1734e4bde6f7ca021d14ff8 bogofilter-0.95.0.tar.bz2
9849577cc95cec84bc4fda40f56f1ab1 bogofilter-0.95.0.tar.gz
6db0c2e96899b570df259828745b5994 bogofilter-static-0.95.0-1.i586.rpm
########################################################################
=================
BOGOFILTER NEWS
=================
!!!!!!!! READ THE RELEASE.NOTES !!!!!!!!
Sections headed '[Incompat <version>]' and '[Major <version>]'
are particularly important. They describe changes that are
incompatible with earlier releases or are significantly
different.
!!!!!!!! READ THE RELEASE.NOTES !!!!!!!!
-------------------------------------------------------------------------------
This release supports unicode (utf-8). A new meta-token .ENCODING has
been added to the wordlist so that bogofilter can determine if it's
using unicode or not. A value of 1 indicates raw storage and 2
indicates utf-8 encoded tokens. Bogofilter checks for this meta-token
and converts incoming text to utf-8 as appropriate.
Command line options "--unicode=yes" and "--unicode=no" can be used.
° With bogofilter, they control encoding of newly created
databases.
° With bogoutil, during wordlist maintenance they change wordlist
to/from unicode.
° For bogolexer, they viewing parser results in new and
old modes
./configure options allow bogofilter customization.
° "./configure --unicode=yes" will _always_ operate in unicode mode
° "./configure --unicode=no" will _never_ operate in unicode mode
Wordlists can be converted from raw storage to unicode using:
bogoutil -d wordlist.db > wordlist.raw.txt
iconv -f iso-8859-1 -t utf-8 < wordlist.raw.txt > wordlist.utf8.txt
bogoutil -l wordlist.db.new < wordlist.utf8.txt
or:
bogoutil --unicode=yes -m wordlist.db
Wordlists can be converted from unicode to raw storage using:
bogoutil -d wordlist.db > wordlist.utf8.txt
iconv -f utf-8 -t iso-8859-1 < wordlist.utf8.txt > wordlist.raw.txt
bogoutil -l wordlist.db.new < wordlist.raw.txt
or:
bogoutil --unicode=no -m wordlist.db
For a wordlist based on a different charset, for example CP866 or
KOI8-R, use that charset in dump/convert/load command sequences above.
For a wordlist containing tokens from multiple languages, particularly
non-european languages, the conversion methods described above may not
work well for you. Building a new wordlist (from scratch) will likely
work better as the new wordlist will be based solely on unicode.
More information about the Bogofilter-announce
mailing list