bogofilter-0.95.0 - New Current Release

David Relson relson at osagesoftware.com
Tue Jun 21 04:03:13 CEST 2005


Greetings,

This release provides unicode support for new and converted wordlists.
Existing wordlists will use iso-8859-1 (as before).  Read the 0.95.0
release notes below for more details.

Enjoy!

David

########################################################################

Files are available at http://sourceforge.net/projects/bogofilter for
download.

Here are the md5sums for the release:
c41eb71bedfd5523b42dae6ae22979c8  bogofilter-0.95.0-1.i586.rpm
0569f9b13347879ead8d6ec977cdd25d  bogofilter-0.95.0-1.src.rpm
d1caee67d1734e4bde6f7ca021d14ff8  bogofilter-0.95.0.tar.bz2
9849577cc95cec84bc4fda40f56f1ab1  bogofilter-0.95.0.tar.gz
6db0c2e96899b570df259828745b5994  bogofilter-static-0.95.0-1.i586.rpm

########################################################################

			       =================
				BOGOFILTER NEWS
			       =================

	!!!!!!!! READ THE RELEASE.NOTES !!!!!!!!

	Sections headed '[Incompat <version>]' and '[Major <version>]'
	are particularly important.  They describe changes that are
	incompatible with earlier releases or are significantly
	different.

	!!!!!!!! READ THE RELEASE.NOTES !!!!!!!!

-------------------------------------------------------------------------------

This release supports unicode (utf-8).  A new meta-token .ENCODING has
been added to the wordlist so that bogofilter can determine if it's
using unicode or not.  A value of 1 indicates raw storage and 2
indicates utf-8 encoded tokens.  Bogofilter checks for this meta-token
and converts incoming text to utf-8 as appropriate.  

Command line options "--unicode=yes" and "--unicode=no" can be used.

    ° With bogofilter, they control encoding of newly created
      databases.

    ° With bogoutil, during wordlist maintenance they change wordlist
      to/from unicode.

    ° For bogolexer, they viewing parser results in new and
      old modes

./configure options allow bogofilter customization.

    ° "./configure --unicode=yes" will _always_ operate in unicode mode

    ° "./configure --unicode=no"  will _never_ operate in unicode mode

Wordlists can be converted from raw storage to unicode using:

    bogoutil -d wordlist.db > wordlist.raw.txt
    iconv -f iso-8859-1 -t utf-8 < wordlist.raw.txt > wordlist.utf8.txt
    bogoutil -l wordlist.db.new < wordlist.utf8.txt

or:

    bogoutil --unicode=yes -m wordlist.db

Wordlists can be converted from unicode to raw storage using:

    bogoutil -d wordlist.db > wordlist.utf8.txt
    iconv -f utf-8  -t iso-8859-1 < wordlist.utf8.txt > wordlist.raw.txt
    bogoutil -l wordlist.db.new < wordlist.raw.txt

or:

    bogoutil --unicode=no -m wordlist.db

For a wordlist based on a different charset, for example CP866 or
KOI8-R, use that charset in dump/convert/load command sequences above.

For a wordlist containing tokens from multiple languages, particularly
non-european languages, the conversion methods described above may not
work well for you.  Building a new wordlist (from scratch) will likely
work better as the new wordlist will be based solely on unicode.



More information about the Bogofilter-announce mailing list