[patch] small lexer changes
Mark M. Hoffman
mhoffman at lightlink.com
Thu Oct 10 07:10:00 CEST 2002
Hello Matthias:
* Matthias Andree <matthias.andree at gmx.de> [2002-10-10 02:01:02 +0200]:
> See the current lexertest.c, it has a -q option. It turned out
> printf took some time, but not so much as to ruin the data.
>
> %option user time text size time size
> full: 6.04 ÃÂñ 0.06 s 1481727 +0% -0% (5 samples)
> full ecs: 6.33 ÃÂñ 0.10 s 340843 +5% -77% (5 samples)
> fast: 6.79 ÃÂñ 0.06 s 2836415 +12% +91% (3 samples)
> ecs: 7.41 ÃÂñ 0.08 s 105247 +23% -93% (5 samples)
My results are similar (trying default instead of 'fast')
full: .546 1515388 +0% -0% (5 samples)
full ecs: .602 374005 +10% -75% (5 samples)
ecs: .662 139695 +21% -91% (5 samples)
<none>: .614 139695 +12% -91% (5 samples)
I ran these tests on the public corpus that was mentioned earlier.
That's 1100 messages per sample (12.2M uncompressed).
So it looks like just cutting 'full' is the winner; this corresponds
to -Cem on the command line (the default). I'm checking this in now.
Thanks for the data and lexertest mod.
> So I'd suggest "full ecs" or "ecs". "fast" is inacceptable in wasting
> space AND time. Probably a tribute to cache effects, but I'm too lazy to
> run cachegrind now, I'm not about to tune flex(1).
This lexer doesn't meet the criteria for using 'fast', so the blowup
in size doesn't surprise me.
Regards,
--
Mark M. Hoffman
mhoffman at lightlink.com
More information about the bogofilter-dev
mailing list