What is a word (lexertest)
David Relson
relson at osagesoftware.com
Tue Oct 22 13:30:44 CEST 2002
At 06:59 AM 10/22/02, Boris 'pi' Piwinger wrote:
>Hi!
>
>Even though I don't code here, I tested something;-)
>
>[3.14 at pi ~/local/bogolists]$ echo "»cmsg newgroup«"|lexertest
>get_token: 1 '»cmsg'
>get_token: 1 'newgroup«'
>[3.14 at pi ~/local/bogolists]$ echo "bla"|lexertest
>
>Both results are not really satisfiying. There might be a reason why
>the second does not return anything, but the first is wrong. Well,
>here we have the problem that we cannot tell without looking at the
>charset.
>
>pi
pi,
There _is_ a problem with the lexer.
If a line contains exactly one token (composed only of letters and digits),
the lexer will ignore it.
If there're delimiters (spaces, punctuation, control characters) at the
beginning or the end of the line, the lexer will return it.
If there're special characters (underscore, dash, etc) in the token, the
lexer will return it.
We need our lexer expert here !!! Clint Adams, are you watching ???
David
More information about the Bogofilter
mailing list