replace_nonascii_characters [was: using iconv()]

Sun Jan 9 23:50:04 CET 2005

David Relson wrote:

>On Sun, 09 Jan 2005 21:01:50 +0300
>Evgeny Kotsuba wrote:
>
>...[snip]...
>
>  
>
>>By the way, I doesn't understand any reason for using  
>>replace_nonascii_characters  in  init_charset_table() :
>>void init_charset_table(const char *charset_name)
>>{
>>......
>>           if (replace_nonascii_characters &&
>>               charset->allow_nonascii_replacement)
>>               map_nonascii_characters();
>>...
>>i.e. if we have replace_nonascii_characters set, then all will be 
>>converted to ?? in other places,  but if we doesn't use 
>>replace_nonascii_characters, but still want to ignore  some codepages,  
>>say, azian  and charset->allow_nonascii_replacement is set - then we 
>>can't do it.  So I commented it  in my code
>>           if ( /* replace_nonascii_characters && */
>>               charset->allow_nonascii_replacement)
>>    
>>
>
>Evgeny,
>
>replace-nonascii--characters is useful mostly for users of us-ascii and
>english speakers as english doesn't use characters above 0x80 (except
>for some punctuation in the Windows charset).
>
>Most of the mail I receive with characters above 0x80 is asian language
>spam.  Bogofilter makes an attempt to parse such messages, though the
>results don't make sense (semantically speaking). Substituting '?' for
>high bit characters results in a smaller wordlist as many tokens will
>map to (for example) '????a?'.
>  
>
For some non-english speakers replaing nonascii - characters is  also 
very good thing for the same reasons but for asians codepages, or more 
correct - for codepages with allow_nonascii_replacement.  In all 
internet software russians almost automatically set 
"replace_nonascii_characters=false", "allow 8bit coding" and so on.
In case  replace_nonascii_characters=false

if ( replace_nonascii_characters && charset->allow_nonascii_replacement)

will be always false...

By the way, why substitue '?' not just space ?

SY,
EK