FAQ: Asian spam

David Relson relson at osagesoftware.com
Thu Mar 27 03:25:39 CET 2003

At 01:48 PM 3/26/03, Simon Huggins wrote:

>On Wed, Mar 26, 2003 at 05:33:17PM +0100, Boris 'pi' Piwinger wrote:
> > I think there should be something on asian spam.
>Er, bogofilter works fine on Asian (and other) spam.  Once it's seen
>some (and I have a fair bit in my training folder) it works very well.
>I'm not sure why you would want something specific about this as
>distinct from say Viagra spam or Nigerian scam spam?


Bogofilter's parsing and scoring of asian spam is something of a mystery to 
me.  As I don't know the languages involved and haven't paid any attention 
to the character sets, I can't say whether the tokens produced by the lexer 
actually correspond to words or not.  I do know that bogofilter _does_ 
process messages with asian character sets and produce spam scores that 
work fine for ignorant me.

Do any of you know whether or not bogofilter's processing is correct for 
the Chinese, Japanese, or Korean charsets?  Or is the parsing totally 
bogus, but sufficiently repeatable to produce usable spam/ham classifications?


More information about the Bogofilter mailing list