Problems with default charset and map_xlate_characters
David Relson
relson at osagesoftware.com
Thu Sep 25 14:02:29 CEST 2003
On Thu, 25 Sep 2003 15:10:16 +0400
Evgeny Kotsuba <evgen at shatura.laser.ru> wrote:
> Hi,
>
> It seems that with default charset wrong things are doing for any
> mail's charset exept well knowng to bogofilter, Even if
> allow_nonascii_replacement = 0. Problem is with map_xlate_characters
> wich has nothing common with ascii. Say I have letter in russian
> koi-8R coding wich should be standart for russian and used in unix and
>
> in "right" mailers. There also may be a number of codings for other
> ex-ussr rebublics like Ukrainian and more, we have now codings for
> russia's national republics (something like states in US or provinces
> in Canada)
>
> Also next comment to: map_nonascii_characters - this is very bad
> function for any statictics etc. I have made some russian's codepage
> decoder for decoding mails with wrong double and triple recodings and
> have name such coding as "Debillnaia" (de-billy's) because in case if
> you have message like ???? ??? ?? ???? any decoding will false.
> So if you have a lot of messages in foreing coding as spam that map
> to ???? ????? etc. and than have any short letter with some foreing
> words (say signature, user's name etc.) - than what will be ?
>
> SY,
> EK
Evgeny,
Charset support is a known incompleteness in bogofilter. It was written
by an English-centric coder, namely me. It works well for people whose
ham is all ISO-8859-1. Similarly, the replace_nonascii_character option
was included as a way to deal with Asian spam. Again, it works fine in
an English-centric environment.
Recognizing that there are other needs, a basic framework was created to
support other character sets. Support for different charsets and
languages is a task (set of tasks) waiting for an interested person (or
persons) to fill in the details.
If you'd care to take on the task of supporting russian koi-8R (or other
languages), we'd be glad to include it in bogofilter.
Peace,
David
More information about the bogofilter-dev
mailing list