... convert_unicode.c ...
Matthias Andree
matthias.andree at gmx.de
Wed Jun 22 11:15:54 CEST 2005
David Relson <relson at osagesoftware.com> writes:
> iconv_open( "to_charset", "from_charset" ) does the preparation for
> character set translation. Bogofilter's iconvert() function uses
> glibc's iconv() to do the work. When iconv_open() rejects the
> "from_charset", a message is output to stderr and bogofilter calls
> iconv_open() with "iso-8859-1" for both "from_charset" and
> "to_charset". The effect of this is that _no_ translation is done. It
> would be equally easy to call iconv_open( "utf-8", "iso-8859-1").
> Unfortunately I don't have information that says which is better:
>
> 1 - no translation
> 2 - iso-8859-1 to utf-8 translation
3 - ignore non-ASCII tokens
4 - fall back to user-specified default
> I'm interested in reasons (or precedents) for one way or the other.
> Anybody know if there's an RFC that applies?
<shrug>
rejecting messages with an invalid character set specification on the
SMTP port after the SMTP DATA phase might be a viable
solution. Whitelist those you can handle and discard the rest. This is
outside bogofilter's scope however.
--
Matthias Andree
More information about the bogofilter-dev
mailing list