[PATCH] combined wordlist a.k.a. single list

Malcolm Dew-Jones yf110 at victoria.tc.ca
Mon Jun 2 21:12:38 CEST 2003




On Mon, 2 Jun 2003, Greg Louis wrote:

> On 20030602 (Mon) at 1046:22 -0700, Malcolm Dew-Jones wrote:
> > 
> > My trivial example looks like it would be larger when combined.  It
> 
> You haven't considered the indexing overhead involved in keeping two
> separate databases, but I understand the point.

I thought about it very briefly but I had no idea how much difference it
would make. It appears the overhead is more important than the data in
this case (which is fine since the data in question is small). 


> Hm.  Your goodlist packs 21 times the tokens I've got into 55 times the
> space.  It might take a while, but I would like to suggest that you try
> 
> for l in spam good; do
>     bogoutil -d ${l}list.db | bogoutil -l ${l}list.new
>     db_verify ${l}list.new
> done
> 

I will definitely play with that thanks.  The production systems use
bogofilter 0.7.  The only issue we have with that is that base64 is not
checked.  The sample databases (based on the same messages) are about
1/10th the size of the new version (.11.2) , and I have just started to
investigate why and what best to do about it.

Thanks.





More information about the bogofilter-dev mailing list