Messages that slow bogofilter down (was: profiling)

Thu Feb 20 23:50:23 CET 2003

Let's summarize:

message   lines    bytes   time

2.txt     67,718   5.214MB   10.90s
3.txt      1,399     106kb    5.66s
4.txt      8,161     630kb  151.13s

2.txt - mostly a large attachment - "TestReport.doc"; base64 encoding
3.txt - 100,000 repetitions of character 'x' in a mime chunk; quoted-printable
4.txt - 600,000 repetitions of character 'x' in a mime chunk; quoted-printablee

The problem is that processing these messages takes too long.  Profiling of 
4.txt shows the time is spent insider low-level lexer code, presumably 
matching 600,000 x's with the rule set in file lexer_text_plain.l

Here're the first few lines of the gprof's output:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
  89.12    374.17   374.17     8105     0.05     0.05  yy_get_previous_state
  10.82    419.61    45.44     8105     0.01     0.01  yy_get_next_buffer
   0.03    419.72     0.11      103     0.00     4.08  text_plain_lex
   0.01    419.76     0.04     8167     0.00     0.00  xfgetsl
   0.01    419.79     0.03     8179     0.00     0.00  get_decoded_line

The question is:  "How can bogofilter handle this faster?"