Russian charsets and functions

Matthias Andree matthias.andree at gmx.de
Tue Jan 4 13:06:22 CET 2005


Evgeny Kotsuba <evgen at shatura.laser.ru> writes:

> one problem is that charset may be set impropelly - by mail client 
> and/or spammer, second problem will be doubling data base. Really  
> english/americans don't  need russian or asian spam or mail,   russian 
> don't need asian spam/mail and all english letterrs are placed to 0-127 
> and russian - to 128-255. All really multy-lang documents I see was sent 
> in .doc or .pdf  and so on.

Some mails earlier you documented how the same Cyrillic characters were
encoded differently in the different character sets, so I presume some
spammer actually exploiting this (we saw a time when spammers massively
used ISO-8859-* accented Latin characters) will have to specify the
proper character set lest he wants to produce garbage.

We don't check .doc or .pdf unless they are wrongly marked as text/*.

-- 
Matthias Andree



More information about the bogofilter-dev mailing list