... convert_unicode.c ...

David Relson relson at osagesoftware.com
Mon Jun 20 13:24:41 CEST 2005


On Mon, 20 Jun 2005 10:32:38 +0200
Matthias Andree wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > Perhaps we'd do better if we disable translation when iconv_open()
> > rejects the character set ???
> 
> Questionable. UTF-8 on the output side is wrong no matter what - if we
> don't do this, and store unconverted data, we have both UTF-8 data and
> junk in the database. We'd better print "cannot convert input character
> set..." and ignore the message.

Hi Matthias,

With a change to "non UTF-8 on the output side" I agree!

The unicode implemenation converts from the input charset to UTF-8.
Message parsing begins with iso-8859-1 for parsing message headers and
then changes as "Content-Type: ... charset=" directives are seen.

The question of the moment is what to do when iconv_open() fails.  As
you suggest we could just ignore the message.  That seems like a bad
idea as one could just add a dummy mime body section with a bogus
charset and bogofilter would be disabled.  Not good!

It would be better to turn off translation and simply parse whatever
text is present. Translation will resume at the next 
"Content-Type: ... charset=" directive.  True, some untranslated text
would be passed through, but the impact would probably be minor.

Regards,

David




More information about the bogofilter-dev mailing list