register restructuring gives 60% time reduction on registering big mbox

Matthias Andree matthias.andree at gmx.de
Mon Nov 25 20:21:54 CET 2002


Hi,

I have some uncommitted speedup changes, that give another 60% time
reduction when registering 17.7 (SI-)MByte of 2277 emails, on my
machine, it's down from (19.05 ± 0.1) s to (7.53 ± 0.05) s. (4 runs
each). Yesterday at this time, the code took 22.97 s to run for the same
task (only 1 run). In any case, I only looked at the user time, because
the system time was always around 1.5 s no matter what I did.

I split "collect.c" off "register.c", and collect_words processes a
single message and returns a boolean "more info available.".

Register will then move the obtained hash into the bigger one, so that
it can iterate only over the seen messages in the hash.

David's system tests were useful in finding an obscure bug I made when
splitting and changing the API. Now, make distcheck is fine again.

Oh, bogoutil had to be adjusted, it now exits with code 2 when it's
being passed more than one message for scoring (doesn't make sense
anyways, and the check was trivial)
-> bogofilter: must get only one message to calculate spamicity!

Should I commit these changes or should I spawn a development branch for
that or should I hold them?


There is one more lexer fix that would have to be done: For reading from



More information about the bogofilter-dev mailing list