t.bulkmode problem

Matthias Andree matthias.andree at gmx.de
Tue Nov 23 15:39:25 CET 2004


Tom Anderson <tanderso at oac-design.com> writes:

> I haven't really been following this too much, but I hope my suggestion
> is useful.  Would it be possible to access seperate environments
> sequentially instead of concurrently, and would that solve the multiple
> locking problem?  In other words, do the changes in multiple
> environments need to be a single atomic transaction, or could it be
> split into one atomic transaction per environment?  Maybe even have the
> program call itself recursively?

Good idea, thank you.

The general problem is, in short:

1. a transactional or concurrent database file needs a _writable_
   environment (__db.*, log.*, lockfile-*) even for read-only access

2. we need to read a token and the corresponding .MSG_COUNT token in the
   same transaction, or with a database that cannot change.

3. we are currently reading all tokens, sorting them lexicographically
   (to profit from B-whatever-tree locality of lexicographically
   short-distance tokens, with proven significant benefits)
   and then for each token trying all lists in order of their
   preference. Lists at same preference are accessed in order of
   appearance in the configuration, don't ask me if forward or reverse,
   I haven't checked.

What you suggest would mean that we:

a. read all tokens and sort the list
b. open the first wordlist/environment for reading
c. gather spam/ham probabilities for all tokens listed in that list
   and delete them from the sorted list
d. close the wordlist
e. repeat b - d for subsequent wordlists until the list is exhausted.
f. if -u mode is effective, re-open first ("default") wordlist for update

This should be doable, but I cannot yet estimate if that would be more
or less effort than finishing multiple-environment support.

For full multiple-environment support, the locking scheme will have to
be rewritten to some extent, it currently supports exactly one
environment per process.

David, your opinion is also solicited :)

-- 
Matthias Andree



More information about the bogofilter-dev mailing list