radical lexer

Boris 'pi' Piwinger 3.14 at piology.org
Sun Nov 26 19:54:35 CET 2006


David Relson <relson at osagesoftware.com> wrote:

>A quick comparison of bogofilter's lexer_v3.l and your radical lexer
>was interesting, particularly the following line:
>
>TOKENBORDER [^[:blank:][:cntrl:]<>;&%@|/\\{}^"*,[\]=()+?:#$._!'`~-]
>
>Evidently you're excluding lots of characters from tokens, notably
>dollar sign, period, underscore, exclamation mark, apostrophe, and
>hyphen.  This the following tokens have different meanings for you and
>me:
>
>  $123
>  domain.com
>  domain_name
>  bad!!!
>  don't
>  un-complicated

You are absolutely right. This is the radical redesign. In
the end it did a little bit better than the standard, so I
reduced complexity at least at no cost.

>By the way, I probably would have used name TOKENCHAR instead of
>TOKENBORDER :->

You are right. Actually, it comes from history when I first
chose to make start and beginning the same but different
from the middle. New version in progress to reduce
differences to your version (like backslashes and
whitespace).

pi



More information about the Bogofilter mailing list