RFC-2047

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Jul 23 08:33:51 CEST 2003


Matthias Andree <matthias.andree at gmx.de> wrote:

>> But I don't see why the same word should show up several
>> times because of different codings.
>
>- Spam in different character sets, including falsely declared
>  ones. German-language spam comes undeclared, as ASCII, ISO-8859-1,
>  -15, Windows-1252. The same character sets are available for English,
>  Spanish and French.

The very same is true for ham. Reasonable people use
ISO-8859-1/15 or utf-8 as needed. Lusers (Outbreak Excess)
do not declare anything by default and prefers -- I think --
Windows-1252 otherwise. This is also not detected if
different.

>> Furher, we already discussed, that we cannot even tell what is
>> whitespace or punctuation if we don't understand the charset.
>
>True, but without such a developer or at least tester feedback, this
>isn't going to change. I'm not adding code that I cannot test and that I
>cannot have tested by somebody.

What exactly do you need for testing? Maybe I can help.

pi




More information about the Bogofilter mailing list