replace_nonascii_characters [was: using iconv()]

David Relson relson at osagesoftware.com
Sun Jan 9 20:42:21 CET 2005


On Sun, 09 Jan 2005 21:01:50 +0300
Evgeny Kotsuba wrote:

...[snip]...

> By the way, I doesn't understand any reason for using  
> replace_nonascii_characters  in  init_charset_table() :
> void init_charset_table(const char *charset_name)
> {
> ......
>            if (replace_nonascii_characters &&
>                charset->allow_nonascii_replacement)
>                map_nonascii_characters();
> ...
> i.e. if we have replace_nonascii_characters set, then all will be 
> converted to ?? in other places,  but if we doesn't use 
> replace_nonascii_characters, but still want to ignore  some codepages,  
> say, azian  and charset->allow_nonascii_replacement is set - then we 
> can't do it.  So I commented it  in my code
>            if ( /* replace_nonascii_characters && */
>                charset->allow_nonascii_replacement)

Evgeny,

replace-nonascii--characters is useful mostly for users of us-ascii and
english speakers as english doesn't use characters above 0x80 (except
for some punctuation in the Windows charset).

Most of the mail I receive with characters above 0x80 is asian language
spam.  Bogofilter makes an attempt to parse such messages, though the
results don't make sense (semantically speaking). Substituting '?' for
high bit characters results in a smaller wordlist as many tokens will
map to (for example) '????a?'.

HTH,

David



More information about the bogofilter-dev mailing list