Messages that slow bogofilter down (was: profiling)
David Relson
relson at osagesoftware.com
Thu Feb 20 23:50:23 CET 2003
Let's summarize:
message lines bytes time
2.txt 67,718 5.214MB 10.90s
3.txt 1,399 106kb 5.66s
4.txt 8,161 630kb 151.13s
2.txt - mostly a large attachment - "TestReport.doc"; base64 encoding
3.txt - 100,000 repetitions of character 'x' in a mime chunk; quoted-printable
4.txt - 600,000 repetitions of character 'x' in a mime chunk; quoted-printablee
The problem is that processing these messages takes too long. Profiling of
4.txt shows the time is spent insider low-level lexer code, presumably
matching 600,000 x's with the rule set in file lexer_text_plain.l
Here're the first few lines of the gprof's output:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
89.12 374.17 374.17 8105 0.05 0.05 yy_get_previous_state
10.82 419.61 45.44 8105 0.01 0.01 yy_get_next_buffer
0.03 419.72 0.11 103 0.00 4.08 text_plain_lex
0.01 419.76 0.04 8167 0.00 0.00 xfgetsl
0.01 419.79 0.03 8179 0.00 0.00 get_decoded_line
The question is: "How can bogofilter handle this faster?"
More information about the Bogofilter
mailing list