Radical lexers

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu Dec 11 13:14:22 CET 2003


Boris 'pi' Piwinger wrote:

Per David's request some more information:

> my version (a) of the lexer (http://piology.org/bogofilter/lexer_v3.l)
> much stricter version of it (b): [^[:blank:][:cntrl:]<>;&%@|/\\{}^"*,[\]=()+?:#$._!'`~-]+
> even more extreme (c). Tokens are explicitely: [[:alnum:]]+

New table:
>    wordlist compacted false neg false pos 1-tokens 1-sig 2-tokens 2-sig
> a) 27060k   19364k    210/13612 16/15670  179      60    4235     998
> b) 26832k   18128k    206/13612 17/15670  178      60    4473     988
> c) 23332k   16368k    210/13612 18/15670  62       0     2678     832

So we see that the size effect is still small for compacted db.

I don't see an easy way to also check for 1- and 2-tokens
with tags. It is easy to grep them, but harder to see if
they are significant.

pi




More information about the Bogofilter mailing list