FAQ: Asian spam
David Relson
relson at osagesoftware.com
Thu Mar 27 03:39:27 CET 2003
At 06:06 PM 3/26/03, Boris 'pi' Piwinger wrote:
>Simon Huggins <huggie at earth.li> wrote:
>
> >> I think there should be something on asian spam.
> >
> >Er, bogofilter works fine on Asian (and other) spam. Once it's seen
> >some (and I have a fair bit in my training folder) it works very well.
> >
> >I'm not sure why you would want something specific about this as
> >distinct from say Viagra spam or Nigerian scam spam?
>
>Well, the problem is that Bogofilter cannot really
>understand the text. I'm not sure how the lexer performs
>there and if this potentially blows up the database.
>
>pi
pi,
You're forgetting bogofilter's replace_nonascii_characters option, which
converts high-bit characters to question marks, and can be used to minimize
tokens resulting from asian lanuage spam. It is most helpful for people
who receive primarily english language messages since it doesn't affect
characters 0x00 through 0x7F. The option doesn't mix well with european
languages and the multitude of accented vowels and consonants which are in
the 0x80-0xFF region.
David
More information about the Bogofilter
mailing list