What is a word (lexertest)

David Relson relson at osagesoftware.com
Tue Oct 22 13:30:44 CEST 2002


At 06:59 AM 10/22/02, Boris 'pi' Piwinger wrote:

>Hi!
>
>Even though I don't code here, I tested something;-)
>
>[3.14 at pi ~/local/bogolists]$ echo "»cmsg newgroup«"|lexertest
>get_token: 1 '»cmsg'
>get_token: 1 'newgroup«'
>[3.14 at pi ~/local/bogolists]$ echo "bla"|lexertest
>
>Both results are not really satisfiying. There might be a reason why
>the second does not return anything, but the first is wrong. Well,
>here we have the problem that we cannot tell without looking at the
>charset.
>
>pi

pi,

There _is_ a problem with the lexer.

If a line contains exactly one token (composed only of letters and digits), 
the lexer will ignore it.

If there're delimiters (spaces, punctuation, control characters) at the 
beginning or the end of the line, the lexer will return it.

If there're special characters (underscore, dash, etc) in the token, the 
lexer will return it.

We need our lexer expert here !!!  Clint Adams, are you watching ???

David





More information about the Bogofilter mailing list