Russian charsets and functions
Matthias Andree
matthias.andree at gmx.de
Tue Jan 4 13:06:22 CET 2005
Evgeny Kotsuba <evgen at shatura.laser.ru> writes:
> one problem is that charset may be set impropelly - by mail client
> and/or spammer, second problem will be doubling data base. Really
> english/americans don't need russian or asian spam or mail, russian
> don't need asian spam/mail and all english letterrs are placed to 0-127
> and russian - to 128-255. All really multy-lang documents I see was sent
> in .doc or .pdf and so on.
Some mails earlier you documented how the same Cyrillic characters were
encoded differently in the different character sets, so I presume some
spammer actually exploiting this (we saw a time when spammers massively
used ISO-8859-* accented Latin characters) will have to specify the
proper character set lest he wants to produce garbage.
We don't check .doc or .pdf unless they are wrongly marked as text/*.
--
Matthias Andree
More information about the bogofilter-dev
mailing list