Radical lexers
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Thu Dec 11 13:14:22 CET 2003
Boris 'pi' Piwinger wrote:
Per David's request some more information:
> my version (a) of the lexer (http://piology.org/bogofilter/lexer_v3.l)
> much stricter version of it (b): [^[:blank:][:cntrl:]<>;&%@|/\\{}^"*,[\]=()+?:#$._!'`~-]+
> even more extreme (c). Tokens are explicitely: [[:alnum:]]+
New table:
> wordlist compacted false neg false pos 1-tokens 1-sig 2-tokens 2-sig
> a) 27060k 19364k 210/13612 16/15670 179 60 4235 998
> b) 26832k 18128k 206/13612 17/15670 178 60 4473 988
> c) 23332k 16368k 210/13612 18/15670 62 0 2678 832
So we see that the size effect is still small for compacted db.
I don't see an easy way to also check for 1- and 2-tokens
with tags. It is easy to grep them, but harder to see if
they are significant.
pi
More information about the Bogofilter
mailing list