Crm114-like Phrases and partial phrases; database size

michael at optusnet.com.au michael at optusnet.com.au
Tue May 20 00:29:08 CEST 2003


Greg Louis <glouis at dynamicro.on.ca> writes:
> On 20030518 (Sun) at 1914:41 -0400, Greg Louis wrote:
> 
> > > Database size is a _major_ potential problem.  [...]
[...]
> Some people will consider that the database size expansion is
> sufficiently undesirable to outweigh the improvement in discrimination. 
> Throughput might become a problem as well, especially for larger
> installations.

I don't know  if I count as a 'larger installation' or not (planning
to use it to filter about 3 - 5 million emails per day) but some thoughts:

Given sufficent ram, the drop in thruput should be proportional
to the log of the dbase size. So a 10-fold increase in size
should be only a 20% drop in thruput.

The other point I'd mention is that accuracy matters. A relatively
small improvement in accuracy would matter a fair bit on the volume
I'm interested in. Paying a bit in dbase size is a cheap price against
that.

Speaking personally, as long as the dbase size comes in under about
2 - 3 gigs I'm going to be happy. :)

Michael.




More information about the Bogofilter mailing list