What is a word (lexertest)

Matthias Andree matthias.andree at gmx.de
Wed Oct 23 13:44:04 CEST 2002


On Tue, 22 Oct 2002, David Relson wrote:

> At 06:59 AM 10/22/02, Boris 'pi' Piwinger wrote:
> 
> >Hi!
> >
> >Even though I don't code here, I tested something;-)
> >
> >[3.14 at pi ~/local/bogolists]$ echo "»cmsg newgroup«"|lexertest
> >get_token: 1 '»cmsg'
> >get_token: 1 'newgroup«'
> 
> I'm guessing that you wanted the two special characters removed???

I'll object. « and » have valid national characters in other ISO
character sets, and these characters clearly will not show in
valid cmsg checkgroups. Leaving them in as-is is fine with me.
Frenchies, scream now ;-)

> >[3.14 at pi ~/local/bogolists]$ echo "bla"|lexertest
> >
> >Both results are not really satisfiying. There might be a reason why
> >the second does not return anything, but the first is wrong. Well,
> >here we have the problem that we cannot tell without looking at the
> >charset.
> 
> As said in an earlier message, simple words (only letters and digits, no 
> special characters) that are alone on a line are skipped by the current 
> lexer.  Hopefully our lexer expert (Clint) can give us a fix.

I'm looking into this.

-- 
Matthias Andree




More information about the Bogofilter mailing list