Radical lexers (was: Test with different lexers)
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Dec 10 14:51:43 CET 2003
Boris 'pi' Piwinger wrote:
> Next I will test (I don't promise any time too soon) is not
> allowing any punctuation at all.
OK, this is a very short test only (I spent already too much
time on bogofilter the last two days;-). I compare my
version of the lexer
http://piology.org/bogofilter/lexer_v3.l with a much
stricter version of it. TOKEN will effectively of the form
[^[:blank:][:cntrl:]<>;&%@|/\\{}^"*,[\]=()+?:#$._!'`~-]+
So no more difference from where in a token a character
shows up. NO punctuation (I hope I did not miss anything).
So basically letters, digits and characters outside ASCII
are allowed.
And even more extreme. Tokens are explicitely: [[:alnum:]]+
Here is what I get:
a) my version of the lexer
27060k
fn=210/13612
fp=16/15670
b) radical lexer
26832k
fn=206/13612
fp=17/15670
c) most radical lexer
23332k
fn=210/13612
fp=17/15670
So the size is a surprise. I expected something much smaller
for b) and even more for c).
The result for b) hurts. It says (if it can be confirmed)
that we are doing much too complicated things when defining
a token. I did really not expect that lexer to work. But
well, that's how it is.
c) is really mind-blowing. This simply MUST NOT work.
pi
More information about the Bogofilter
mailing list