A suggestion for non-ASCII Scoring

Mon Jan 26 13:40:37 CET 2004

On Mon, 26 Jan 2004 09:35:51 -0000
Peter Bishop wrote:

> On 23 Jan 2004 at 10:58, Greg McCann wrote:
> 
> > The only situation where I could see this not being helpful is for
> > users that receive legitimate email containing a lot of non-ASCII
> > characters. In that case, they may want to continue scoring
> > non-ASCII words as distinct tokens.
> > 
> 
> Why use the "replace non-ASCII" option in the first place?
> I don't - so if I look in my database I see some pretty weird tokens
> (Korean/Chinese) but the character sequences still make words
> so even if I don't understand them, bogofilter does
> So they are classified in the normal way. 
> Result - no  Korean spam gets through now.

Hi Peter,

replace-nonascii cuts down the number of weird tokens.  It's a space
saver.  That's all.

David