compactifying the database

David Relson relson at osagesoftware.com
Sun Mar 19 23:41:32 CET 2006


On Mon, 20 Mar 2006 00:20:32 +0200
Anatoly Vorobey wrote:

> You wrote on Wed, Mar 15, 2006 at 11:14:04AM +0100:
> > > It does seem to slow down a bit after a week of running. wordlist.db
> > >is 17Mb now and I thought it might need to be compacted somehow.
> > >However, the bf_compact-sqlite script included in the package doesn't
> > >wor, as it tries to call * rather than *-sqlite programs; and when I
> > >changed it to call the right executables, bogofilter complains that it
> > >doesn't understand --db-transaction.
> > 
> > Well, the script shouldn't be passing that option. Is "TXN" set in the
> > environment? If so, unset it. Else, check what's in your ~/.bogofilter
> > directory. The -sqlite package should not be dropping log.* files which
> > cause bf_compact to add --db-transaction.
> > 
> > If it's growing too fast, remove the "-u" option from bogofilter calls.
> 
> What's "too fast"? 
> 
> > Oh, and ubuntu shouldn't be renaming the executables (or should patch
> > the scripts) if they intend to keep the scripts working. If that's
> > really the problem, please file a bug report with Ubuntu.
> 
> They're renaming both scripts and executables, but apparently not
> patching the scripts. I'll check if the latest package still has the
> problem, and will file a bug report if needed. 

Sounds good!
 
> My question really was: I don't know what compactifying does in the
> Berkeley version; maybe its whole purpose is to do something to the
> transaction logs, and then there's no meaning to even trying to invoke
> it (after fixing the script) in the sqlite version.

As you know, a database is a file composed of blocks of data with each
block containing one or more database records.  In a newly built (or
compacted) database the tokens are alphabetical and the blocks are all
full.  

If you then _add_ a token, BerkeleyDB will split the block into 2
blocks and add the token to one of them.  The result is 2 non-full
blocks.  When the next token is added, it will either go into 1 of the
non-full blocks or will cause another block to be split.  Over time as
tokens are added, the amount of empty space in a database will grow.
Compacting builds a new database without the empty space.

If you run db_stat (which is part of the BerkeleyDB package), it'll
give you a lot of info about your database.

> > > My question: is there a need/an ability to compact the database in the
> > >SQLite version, and if so, how?
> > 
> > Well, you won't gain too much. Except if you removed tokens "big time"
> > with the maintenance options -- after that, an SQL "VACUUM" command
> > probably would not hurt, which would just be > 
> > sqlite3 ~/.bogofilter/wordlist.db 'VACUUM;'
> 
> Thanks. Guess I need to look into removing tokens "big time", whatever
> that means.

Bogoutil has a maintenance mode which provides some useful
capabilities.  Check its help message and man page and the bogofilter
FAQ for more info.

HTH,

David



More information about the Bogofilter mailing list