... convert_unicode.c ...

Wed Jun 22 11:15:54 CEST 2005

David Relson <relson at osagesoftware.com> writes:

> iconv_open( "to_charset", "from_charset" ) does the preparation for
> character set translation.  Bogofilter's iconvert() function uses
> glibc's iconv() to do the work.  When iconv_open() rejects the
> "from_charset", a message is output to stderr and bogofilter calls
> iconv_open() with "iso-8859-1" for both "from_charset" and
> "to_charset".  The effect of this is that _no_ translation is done.  It
> would be equally easy to call iconv_open( "utf-8", "iso-8859-1").
> Unfortunately I don't have information that says which is better:
>
> 1 - no translation
> 2 - iso-8859-1 to utf-8 translation

  3 - ignore non-ASCII tokens
  4 - fall back to user-specified default

> I'm interested in reasons (or precedents) for one way or the other.
> Anybody know if there's an RFC that applies?

<shrug>

rejecting messages with an invalid character set specification on the
SMTP port after the SMTP DATA phase might be a viable
solution. Whitelist those you can handle and discard the rest. This is
outside bogofilter's scope however.

-- 
Matthias Andree