A suggestion for non-ASCII Scoring
David Relson
relson at osagesoftware.com
Mon Jan 26 13:40:37 CET 2004
On Mon, 26 Jan 2004 09:35:51 -0000
Peter Bishop wrote:
> On 23 Jan 2004 at 10:58, Greg McCann wrote:
>
> > The only situation where I could see this not being helpful is for
> > users that receive legitimate email containing a lot of non-ASCII
> > characters. In that case, they may want to continue scoring
> > non-ASCII words as distinct tokens.
> >
>
> Why use the "replace non-ASCII" option in the first place?
> I don't - so if I look in my database I see some pretty weird tokens
> (Korean/Chinese) but the character sequences still make words
> so even if I don't understand them, bogofilter does
> So they are classified in the normal way.
> Result - no Korean spam gets through now.
Hi Peter,
replace-nonascii cuts down the number of weird tokens. It's a space
saver. That's all.
David
More information about the Bogofilter
mailing list