Cyrillic issues in 0.94.12

Matthias Andree matthias.andree at gmx.de
Wed May 25 16:10:09 CEST 2005


Clint Adams <schizo at debian.org> writes:

>> Unicode is a great thing, but as it was already noted here some
>> time ago, Unicode will become really usable not sooner than Unix
>> OSes get full support for it in screen drivers, user software etc.
>> Until then, using national encodings is more convenient since a
>> user can read the tokens from the terminal, debug wordlists etc.
>
> bogofilter could use any Unicode encoding internally regardless of
> the charset used for display.  It seems silly to me to waste code on
> provincial encodings when all this could be done generically.

I share your view and I think we should let the --enable-russian switch
or whatever its name is today disappear in the 0.95 series. We don't
even need to provide an upgrade script for the users, as they can just
type:

bogoutil -d wordlist.db >wordlist.ru
iconv -f CP866 -t utf-8 <wordlist.ru >wordlist.UTF-8
# upgrade bogofilter to --enable-iconv
rm wordlist.db
bogoutil -l wordlist.db <wordlist.UTF-8

However we register the tokens is even irrelevant to the passthrough
mode as this emits verbatim data, and for those parts that print tokens
to the screen other than bogoutil -d (which must always print native
format), we can convert to the current locale.

Besides that, is bogofilter -pvvv safe WRT RFC-2047? :)

-- 
Matthias Andree



More information about the bogofilter-dev mailing list