bogofilter-0.95.1 - New Current Release

David Relson relson at osagesoftware.com
Mon Jun 27 03:34:42 CEST 2005


Greetings,

This release fixes some problems with the unicode support in 0.95.0.
Most notably RFC2047 encoded tokens are converted to unicode (when
appropriate) and bogotune problems with unicode/raw wordlists have been
corrected.

Enjoy!

David

########################################################################

Files are available at http://sourceforge.net/projects/bogofilter for
download.

Here are the md5sums for the release:

6294cb374a0ed4fd467fd8131ed0c3e6  bogofilter-0.95.1-1.i586.rpm
b371d40b7f5e76d1f5738128ef296c74  bogofilter-0.95.1-1.src.rpm
362cecc36d9e82e8fd62496cb7231faf  bogofilter-0.95.1.tar.bz2
f58fa919ad9195ef9d8443c46e7b532d  bogofilter-0.95.1.tar.gz
4f7123851091c533538b4375585f6335  bogofilter-static-0.95.1-1.i586.rpm

########################################################################

Read the NEWS file for a more detailed list of changes since 0.95.0.

Below are the combined RELEASE.NOTES for the unicode releases - 0.95.x


### 0.95.1 ###

Some of the 'make check' scripts use '--unicode=no' and/or
'--unicode=yes' options.  These scripts will fail if bogofilter is
configured in a non-default manner, i.e. if --disable-unicode or
--enable-unicode is specified.

### 0.95.0 ###

This release supports unicode (utf-8).  A new meta-token .ENCODING has
been added to the wordlist so that bogofilter can determine if it's
using unicode or not.  A value of 1 indicates raw storage and 2
indicates utf-8 encoded tokens.  Bogofilter checks for this meta-token
and converts incoming text to utf-8 as appropriate.  

Command line options "--unicode=yes" and "--unicode=no" can be used.

    ° With bogofilter, they control encoding of newly created
      databases.

    ° With bogoutil, during wordlist maintenance they change wordlist
      to/from unicode.

    ° For bogolexer, they viewing parser results in new and
      old modes

./configure options allow bogofilter customization.

    ° "./configure --unicode=yes" will _always_ operate in unicode mode

    ° "./configure --unicode=no"  will _never_ operate in unicode mode

Wordlists can be converted from raw storage to unicode using:

    bogoutil -d wordlist.db > wordlist.raw.txt
    iconv -f iso-8859-1 -t utf-8 < wordlist.raw.txt > wordlist.utf8.txt
    bogoutil -l wordlist.db.new < wordlist.utf8.txt

or:

    bogoutil --unicode=yes -m wordlist.db

Wordlists can be converted from unicode to raw storage using:

    bogoutil -d wordlist.db > wordlist.utf8.txt
    iconv -f utf-8  -t iso-8859-1 < wordlist.utf8.txt > wordlist.raw.txt
    bogoutil -l wordlist.db.new < wordlist.raw.txt

or:

    bogoutil --unicode=no -m wordlist.db

For a wordlist based on a different charset, for example CP866 or
KOI8-R, use that charset in dump/convert/load command sequences above.

For a wordlist containing tokens from multiple languages, particularly
non-european languages, the conversion methods described above may not
work well for you.  Building a new wordlist (from scratch) will likely
work better as the new wordlist will be based solely on unicode.




More information about the Bogofilter-announce mailing list