iconv preview

David Relson relson at osagesoftware.com
Sun Jan 9 19:13:46 CET 2005


Greetings,

CVS has my iconv() patches.  

The default charset "UTF-8" is set in file src/charset_iconv.c, but can
be changed via "./configure --default-charset=...".  

Using the old charset code, parsing does some mapping of special
characters, for example 0x92 to 0x22 (the windows apostrophe to an ascii
apostrophe).  iconv() doesn't have this mapping AFAICCT.  

"make check" works fine -- but only because I removed the special
characters in the input files.  

I've also attached tarball test.0xA0.0xFF.tgz with script
test.0xA0.0xFF.sh which demonstrates the different behavior for special
characters.

You can download the code from cvs via the following commands:

  cvs -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/bogofilter login

  When prompted for a password press the RETURN key.
  After anonymously logging in:

  cvs -z3 -d:pserver:anonymous at cvs.sourceforge.net:/cvsroot/bogofilter \
      co bogofilter

Alternatively, a source tarball is available at:

ftp://ftp.bogofilter.out/pub/outgoing/bogofilter-cvs/bogofilter-0.93.4.cvs.tar.gz

Consider the code to be of pre-beta quality.  It passes "make check",
but hasn't been extensively tested -- mostly because my knowledge of
UTF-8 and Unicode is quite sparse.  'Tis likely there are many details
yet to be worked out.  For example, I know it doesn't convert encoded
words in message header lines.

Have fun!

David


More information about the bogofilter-dev mailing list