What is a word (lexertest)

David Relson relson at osagesoftware.com
Wed Oct 23 14:10:32 CEST 2002


At 07:44 AM 10/23/02, Matthias Andree wrote:

>On Tue, 22 Oct 2002, David Relson wrote:
>
> > At 06:59 AM 10/22/02, Boris 'pi' Piwinger wrote:
> >
> > >Hi!
> > >
> > >Even though I don't code here, I tested something;-)
> > >
> > >[3.14 at pi ~/local/bogolists]$ echo "»cmsg newgroup«"|lexertest
> > >get_token: 1 '»cmsg'
> > >get_token: 1 'newgroup«'
> >
> > I'm guessing that you wanted the two special characters removed???
>
>I'll object. « and » have valid national characters in other ISO
>character sets, and these characters clearly will not show in
>valid cmsg checkgroups. Leaving them in as-is is fine with me.
>Frenchies, scream now ;-)

Matthias,

pi explained to me that the characters were punctuation is some charsets 
and national characters in other charsets.  So I'm not going to do anything 
about them.


> > >[3.14 at pi ~/local/bogolists]$ echo "bla"|lexertest
> > >
> > >Both results are not really satisfiying. There might be a reason why
> > >the second does not return anything, but the first is wrong. Well,
> > >here we have the problem that we cannot tell without looking at the
> > >charset.
> >
> > As said in an earlier message, simple words (only letters and digits, no
> > special characters) that are alone on a line are skipped by the current
> > lexer.  Hopefully our lexer expert (Clint) can give us a fix.
>
>I'm looking into this.

Great!


>--
>Matthias Andree





More information about the Bogofilter mailing list