replace_nonascii_characters [was: using iconv()]
Evgeny Kotsuba
evgen at shatura.laser.ru
Sun Jan 9 23:50:04 CET 2005
David Relson wrote:
>On Sun, 09 Jan 2005 21:01:50 +0300
>Evgeny Kotsuba wrote:
>
>...[snip]...
>
>
>
>>By the way, I doesn't understand any reason for using
>>replace_nonascii_characters in init_charset_table() :
>>void init_charset_table(const char *charset_name)
>>{
>>......
>> if (replace_nonascii_characters &&
>> charset->allow_nonascii_replacement)
>> map_nonascii_characters();
>>...
>>i.e. if we have replace_nonascii_characters set, then all will be
>>converted to ?? in other places, but if we doesn't use
>>replace_nonascii_characters, but still want to ignore some codepages,
>>say, azian and charset->allow_nonascii_replacement is set - then we
>>can't do it. So I commented it in my code
>> if ( /* replace_nonascii_characters && */
>> charset->allow_nonascii_replacement)
>>
>>
>
>Evgeny,
>
>replace-nonascii--characters is useful mostly for users of us-ascii and
>english speakers as english doesn't use characters above 0x80 (except
>for some punctuation in the Windows charset).
>
>Most of the mail I receive with characters above 0x80 is asian language
>spam. Bogofilter makes an attempt to parse such messages, though the
>results don't make sense (semantically speaking). Substituting '?' for
>high bit characters results in a smaller wordlist as many tokens will
>map to (for example) '????a?'.
>
>
For some non-english speakers replaing nonascii - characters is also
very good thing for the same reasons but for asians codepages, or more
correct - for codepages with allow_nonascii_replacement. In all
internet software russians almost automatically set
"replace_nonascii_characters=false", "allow 8bit coding" and so on.
In case replace_nonascii_characters=false
if ( replace_nonascii_characters && charset->allow_nonascii_replacement)
will be always false...
By the way, why substitue '?' not just space ?
SY,
EK
More information about the bogofilter-dev
mailing list