FAQ: Asian spam

David Relson relson at osagesoftware.com
Thu Mar 27 03:32:21 CET 2003


At 06:06 PM 3/26/03, Boris 'pi' Piwinger wrote:

>Simon Huggins <huggie at earth.li> wrote:
>
> >> I think there should be something on asian spam.
> >
> >Er, bogofilter works fine on Asian (and other) spam.  Once it's seen
> >some (and I have a fair bit in my training folder) it works very well.
> >
> >I'm not sure why you would want something specific about this as
> >distinct from say Viagra spam or Nigerian scam spam?
>
>Well, the problem is that Bogofilter cannot really
>understand the text. I'm not sure how the lexer performs
>there and if this potentially blows up the database.
>
>pi

pi,

Bogofilter has the replace_nonascii_characters option that replaces 
high-bit characters, i.e. 0x80-0xFF, with question marks.   For people that 
don't receive mail using these characters, this reduces the number of 
tokens that go into the database.  Unfortunately this option only helps 
those who receive 7-bit ascii, i.e. english.  The option affects the many 
accented vow





More information about the Bogofilter mailing list