RFC-2047
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Jul 23 08:33:51 CEST 2003
Matthias Andree <matthias.andree at gmx.de> wrote:
>> But I don't see why the same word should show up several
>> times because of different codings.
>
>- Spam in different character sets, including falsely declared
> ones. German-language spam comes undeclared, as ASCII, ISO-8859-1,
> -15, Windows-1252. The same character sets are available for English,
> Spanish and French.
The very same is true for ham. Reasonable people use
ISO-8859-1/15 or utf-8 as needed. Lusers (Outbreak Excess)
do not declare anything by default and prefers -- I think --
Windows-1252 otherwise. This is also not detected if
different.
>> Furher, we already discussed, that we cannot even tell what is
>> whitespace or punctuation if we don't understand the charset.
>
>True, but without such a developer or at least tester feedback, this
>isn't going to change. I'm not adding code that I cannot test and that I
>cannot have tested by somebody.
What exactly do you need for testing? Maybe I can help.
pi
More information about the Bogofilter
mailing list