[patch] small lexer changes

Mark M. Hoffman mhoffman at lightlink.com
Thu Oct 10 07:10:00 CEST 2002


Hello Matthias:

* Matthias Andree <matthias.andree at gmx.de> [2002-10-10 02:01:02 +0200]:

> See the current lexertest.c, it has a -q option. It turned out
> printf took some time, but not so much as to ruin the data.
> 
> %option     user time     text size time size
> full:       6.04 ± 0.06 s   1481727  +0%  -0% (5 samples)
> full ecs:   6.33 ± 0.10 s    340843  +5% -77% (5 samples)
> fast:       6.79 ± 0.06 s   2836415 +12% +91% (3 samples)
> ecs:        7.41 ± 0.08 s    105247 +23% -93% (5 samples)

My results are similar (trying default instead of 'fast')
  full:       .546             1515388  +0%  -0% (5 samples)
  full ecs:   .602              374005 +10% -75% (5 samples)
  ecs:        .662              139695 +21% -91% (5 samples)
  <none>:     .614              139695 +12% -91% (5 samples)

I ran these tests on the public corpus that was mentioned earlier.
That's 1100 messages per sample (12.2M uncompressed).

So it looks like just cutting 'full' is the winner; this corresponds
to -Cem on the command line (the default).  I'm checking this in now.
Thanks for the data and lexertest mod.

> So I'd suggest "full ecs" or "ecs". "fast" is inacceptable in wasting
> space AND time. Probably a tribute to cache effects, but I'm too lazy to
> run cachegrind now, I'm not about to tune flex(1).

This lexer doesn't meet the criteria for using 'fast', so the blowup
in size doesn't surprise me.

Regards,

-- 
Mark M. Hoffman
mhoffman at lightlink.com



More information about the bogofilter-dev mailing list