FAQ: Asian spam

David Relson relson at osagesoftware.com
Thu Mar 27 03:39:27 CET 2003


At 06:06 PM 3/26/03, Boris 'pi' Piwinger wrote:

>Simon Huggins <huggie at earth.li> wrote:
>
> >> I think there should be something on asian spam.
> >
> >Er, bogofilter works fine on Asian (and other) spam.  Once it's seen
> >some (and I have a fair bit in my training folder) it works very well.
> >
> >I'm not sure why you would want something specific about this as
> >distinct from say Viagra spam or Nigerian scam spam?
>
>Well, the problem is that Bogofilter cannot really
>understand the text. I'm not sure how the lexer performs
>there and if this potentially blows up the database.
>
>pi


pi,

You're forgetting bogofilter's replace_nonascii_characters option, which 
converts high-bit characters to question marks, and can be used to minimize 
tokens resulting from asian lanuage spam.  It is most helpful for people 
who receive primarily english language messages since it doesn't affect 
characters 0x00 through 0x7F.  The option doesn't mix well with european 
languages and the multitude of accented vowels and consonants which are in 
the 0x80-0xFF region.

David






More information about the Bogofilter mailing list