What is a word (lexertest)
Matthias Andree
matthias.andree at gmx.de
Wed Oct 23 13:44:04 CEST 2002
On Tue, 22 Oct 2002, David Relson wrote:
> At 06:59 AM 10/22/02, Boris 'pi' Piwinger wrote:
>
> >Hi!
> >
> >Even though I don't code here, I tested something;-)
> >
> >[3.14 at pi ~/local/bogolists]$ echo "»cmsg newgroup«"|lexertest
> >get_token: 1 '»cmsg'
> >get_token: 1 'newgroup«'
>
> I'm guessing that you wanted the two special characters removed???
I'll object. « and » have valid national characters in other ISO
character sets, and these characters clearly will not show in
valid cmsg checkgroups. Leaving them in as-is is fine with me.
Frenchies, scream now ;-)
> >[3.14 at pi ~/local/bogolists]$ echo "bla"|lexertest
> >
> >Both results are not really satisfiying. There might be a reason why
> >the second does not return anything, but the first is wrong. Well,
> >here we have the problem that we cannot tell without looking at the
> >charset.
>
> As said in an earlier message, simple words (only letters and digits, no
> special characters) that are alone on a line are skipped by the current
> lexer. Hopefully our lexer expert (Clint) can give us a fix.
I'm looking into this.
--
Matthias Andree
More information about the Bogofilter
mailing list