register restructuring gives 60% time reduction on registering big mbox

David Relson relson at osagesoftware.com
Mon Nov 25 20:44:17 CET 2002


At 02:21 PM 11/25/02, Matthias Andree wrote:

>Hi,
>
>I have some uncommitted speedup changes, that give another 60% time
>reduction when registering 17.7 (SI-)MByte of 2277 emails, on my
>machine, it's down from (19.05 ± 0.1) s to (7.53 ± 0.05) s. (4 runs
>each). Yesterday at this time, the code took 22.97 s to run for the same
>task (only 1 run). In any case, I only looked at the user time, because
>the system time was always around 1.5 s no matter what I did.
>
>I split "collect.c" off "register.c", and collect_words processes a
>single message and returns a boolean "more info available.".
>
>Register will then move the obtained hash into the bigger one, so that
>it can iterate only over the seen messages in the hash.
>
>David's system tests were useful in finding an obscure bug I made when
>splitting and changing the API. Now, make distcheck is fine again.
>
>Oh, bogoutil had to be adjusted, it now exits with code 2 when it's
>being passed more than one message for scoring (doesn't make sense
>anyways, and the check was trivial)
>-> bogofilter: must get only one message to calculate spamicity!
>
>Should I commit these changes or should I spawn a development branch for
>that or should I hold them?

Put them into cvs.  They're an internal speed improvement, so don't need 
external documentation, won't make life harder for the users, etc, etc.

As I recall you were planning some Solaris testing this evening.  After 
that's done I'll cut another beta, i.e. 0.9.0.3.  Barring unexpected 
excitement in the next couple of days, towards the end of the week, we can 
promote today's beta to 0.9.1-stable.

>There is one more lexer fix that would have to be done: For reading from
>a Maildir/, we must suppress the "^From " check, because Maildir/ is
>fully transparent to message contents.

Why?  I thought that check provides the message separator when processing 
mailboxes for training.

>Is this fix post-0.9.X.gold stuff or still in time?

I've re-opened my charset project, but don't want to rush it just to get it 
in 0.9.1-gold.





More information about the bogofilter-dev mailing list