A suggestion for non-ASCII Scoring
greg at cambria.com
Fri Jan 23 12:48:08 EST 2004
On 1/23/2004 at 12:20 PM David Relson <relson at osagesoftware.com> wrote:
>On Fri, 23 Jan 2004 09:00:05 -0800
>Greg McCann wrote:
>> I would like to propose an option to ignore any ASCII
>> characters within a mostly non-ASCII word and tokenize it as if the
>> word was entirely non-ASCII.
>If you want to experiment, I've written a patch that will convert the
>symbols as you want. The change compiles, but I've not run it, so it
>may not work. Test it and let us know if it actually helps.
Thank you, David. This looks great. It compiled fine with the 0.15.4 source and a preliminary test shows that it does exactly what I was hoping for. I will report back after a few days and let you know if it improves the effectiveness of non-ASCII filtering.
More information about the Bogofilter