FAQ: Asian spam
David Relson
relson at osagesoftware.com
Thu Mar 27 03:25:39 CET 2003
At 01:48 PM 3/26/03, Simon Huggins wrote:
>On Wed, Mar 26, 2003 at 05:33:17PM +0100, Boris 'pi' Piwinger wrote:
> > I think there should be something on asian spam.
>
>Er, bogofilter works fine on Asian (and other) spam. Once it's seen
>some (and I have a fair bit in my training folder) it works very well.
>
>I'm not sure why you would want something specific about this as
>distinct from say Viagra spam or Nigerian scam spam?
Simon,
Bogofilter's parsing and scoring of asian spam is something of a mystery to
me. As I don't know the languages involved and haven't paid any attention
to the character sets, I can't say whether the tokens produced by the lexer
actually correspond to words or not. I do know that bogofilter _does_
process messages with asian character sets and produce spam scores that
work fine for ignorant me.
Do any of you know whether or not bogofilter's processing is correct for
the Chinese, Japanese, or Korean charsets? Or is the parsing totally
bogus, but sufficiently repeatable to produce usable spam/ham classifications?
David
More information about the Bogofilter
mailing list