A suggestion for non-ASCII Scoring

Greg McCann greg at cambria.com
Fri Jan 23 18:48:08 CET 2004


On 1/23/2004 at 12:20 PM David Relson <relson at osagesoftware.com> wrote:

>On Fri, 23 Jan 2004 09:00:05 -0800
>Greg McCann wrote:

...
>> I would like to propose an option to ignore any ASCII
>> characters within a mostly non-ASCII word and tokenize it as if the
>> word was entirely non-ASCII.
...

>If you want to experiment, I've written a patch that will convert the
>symbols as you want.  The change compiles, but I've not run it, so it
>may not work.  Test it and let us know if it actually helps.

Thank you, David.  This looks great.  It compiled fine with the 0.15.4 source and a preliminary test shows that it does exactly what I was hoping for.  I will report back after a few days and let you know if it improves the effectiveness of non-ASCII filtering.


Greg McCann






More information about the Bogofilter mailing list