radical lexer
David Relson
relson at osagesoftware.com
Sun Nov 26 19:16:23 CET 2006
Hi pi,
A quick comparison of bogofilter's lexer_v3.l and your radical lexer
was interesting, particularly the following line:
TOKENBORDER [^[:blank:][:cntrl:]<>;&%@|/\\{}^"*,[\]=()+?:#$._!'`~-]
Evidently you're excluding lots of characters from tokens, notably
dollar sign, period, underscore, exclamation mark, apostrophe, and
hyphen. This the following tokens have different meanings for you and
me:
$123
domain.com
domain_name
bad!!!
don't
un-complicated
By the way, I probably would have used name TOKENCHAR instead of
TOKENBORDER :->
Cheers!
David
More information about the Bogofilter
mailing list