What is a word (lexertest)
David Relson
relson at osagesoftware.com
Wed Oct 23 14:10:32 CEST 2002
At 07:44 AM 10/23/02, Matthias Andree wrote:
>On Tue, 22 Oct 2002, David Relson wrote:
>
> > At 06:59 AM 10/22/02, Boris 'pi' Piwinger wrote:
> >
> > >Hi!
> > >
> > >Even though I don't code here, I tested something;-)
> > >
> > >[3.14 at pi ~/local/bogolists]$ echo "»cmsg newgroup«"|lexertest
> > >get_token: 1 '»cmsg'
> > >get_token: 1 'newgroup«'
> >
> > I'm guessing that you wanted the two special characters removed???
>
>I'll object. « and » have valid national characters in other ISO
>character sets, and these characters clearly will not show in
>valid cmsg checkgroups. Leaving them in as-is is fine with me.
>Frenchies, scream now ;-)
Matthias,
pi explained to me that the characters were punctuation is some charsets
and national characters in other charsets. So I'm not going to do anything
about them.
> > >[3.14 at pi ~/local/bogolists]$ echo "bla"|lexertest
> > >
> > >Both results are not really satisfiying. There might be a reason why
> > >the second does not return anything, but the first is wrong. Well,
> > >here we have the problem that we cannot tell without looking at the
> > >charset.
> >
> > As said in an earlier message, simple words (only letters and digits, no
> > special characters) that are alone on a line are skipped by the current
> > lexer. Hopefully our lexer expert (Clint) can give us a fix.
>
>I'm looking into this.
Great!
>--
>Matthias Andree
More information about the Bogofilter
mailing list