speed [was: token pairs]

David Relson relson at osagesoftware.com
Wed Apr 14 02:48:35 CEST 2004


On Tue, 13 Apr 2004 19:19:06 +0200
Boris 'pi' Piwinger wrote:

> David Relson <relson at osagesoftware.com> wrote:
> 
> >I'm not willing to include word pairs until after the 1.0 release,
> >but am willing to let users experiment with the technique.
> 
> I am just running a test on this. It took about two and a
> half hours to bulk verify (-Mv) 25k messages. Interesting
> enough, treating those same messages individually took only
> about 800 seconds! This indicates that there is some bug,
> the results are plausible, though.
> 
> pi

pi,

As a quick test, using a set of 112 messages (listed in file "l"), I ran
bogofilter with and without the "-P" switch and used "time" to print the
info.  Here are the results:

[relson at osage src]$ time bogofilter -C -b < l
Command exited with non-zero status 1
1.93user 0.32system 0:02.43elapsed 92%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (349major+1506minor)pagefaults 0swaps

[relson at osage src]$ time bogofilter -C -b < l -P
Command exited with non-zero status 1
2.52user 0.52system 0:03.21elapsed 94%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (349major+2254minor)pagefaults 0swaps

Using token pairs increased the time from 1.93 to 2.52 seconds (of user
time).  Given the additional work of creating token pairs and looking
them up, this seems reasonable.

David




More information about the Bogofilter mailing list