bogofilter hogging cpu

Matthias Andree matthias.andree at gmx.de
Fri Nov 12 02:53:48 CET 2004


Matthias Andree <matthias.andree at gmx.de> writes:

> From my communications background, I'd think the analysis part is the
> expensive one, and that would be the lexer and similar components, but
> nothing short of a profile can provide that. That means compiling with
> -pg, running a while with your test load and then checking with gprof
> where most time is spent.

OK,

I used oprofile to gather this data and there is little surprise -
here's the top ten of time consumers on my Athlon XP 2500+, for scoring
one Maildir and one mailbox (bogofilter -TM) with my production database:

  %   cumulative   self              self     total           
 time   samples   samples    calls  T1/call  T1/call  name    
 36.44  10891.00 10891.00                             yylex
 10.98  14174.00  3283.00                             msg_compute_spamicity
  6.65  16161.00  1987.00                             yyinput
  5.25  17729.00  1568.00                             get_token
  5.09  19249.00  1520.00                             base64_decode
  5.05  20759.00  1510.00                             word_cmp
  4.67  22154.00  1395.00                             xfgetsl
  4.16  23396.00  1242.00                             wordhash_standard_insert
  3.93  24572.00  1176.00                             ds_read
  2.28  25253.00   681.00                             db_get_dbvalue

So the lexer is a very important part and takes a lot of time.

-- 
Matthias Andree



More information about the Bogofilter mailing list