bogotune broken? Larger data* revision committed to CVS.

Matthias Andree matthias.andree at gmx.de
Sat Nov 20 16:33:33 CET 2004


David Relson <relson at osagesoftware.com> writes:

> I maintain that multiple wordlists are of value.  Requiring the "same
> BDB environment" doesn't affect me personally.  However, one of the
> goals of multiple wordlists is to enable "user" and "system" lists,
> which _can't_ be in the same directory (though I suppose that a
> common environment would be possible).

I'll try in different words, in case I was too tired to word this
properly. We cannot use direct Berkeley DB file system access to access
a database that is owned by a different user, not even for read-only
access. We'll need to lock somebody else's file for the read, and while
this may be possible with olddb, I could just place a lock on your file
and thus prevent you from ever updating the database. Without locking,
you may be reading half stale data, half updated, when the database
owner commits a larger update.

We need to use a different means to access somebody else's database, and
inter-process communication comes to mind. For instance RPC, FIFO or
unix-domain or perhaps IP sockets.

>> - global variables, such as dbe, bogohome, word_lists, message counts,
>>   robs, robx - the references are hard to chase, and there is no
>>   technical reason to use these. (dbe is now gone, message counts
>>   halfway).
>
> While there are well-known issues with global variables, moving them all
> into parameters (or a structure that is passed to everyone) isn't a
> whole lot better.  AFAICT it just changes the appearance of the code,
> not the maintainability.

At least it allows for a good deal of pure functions (those that don't
cause effects outside the parameters), which is one of the major reasons
for my urging of getting rid of all globals. db_lock.c is "impure",
accessing and changing globals, and while it simplifies the direct
function interfaces, it is a PITA if we decide to allow multiple
environments in the code.

> As the globals tend to have primary users, one idea is to declare them
> in the appropriate file and implement accessor functions.  It might be
> better, or it might not be.

I'm not in favor of accessor functions for static variables, we aren't
gaining anything through them except additional complexity. We may as
well do it right and allocate dynamically.

> As you say, declaring them isn't a necessity (except for the one case). 
> Not declaring them puts restriction on their placement in the
> source code.  Declaring them gives a quick summary of what's in
> the file and allows them to be ordered rationally, i.e.
> alphabetically.

Presentation is a matter of the editor software. emacs's speedbar for
one. I rarely access functions by scrolling, I usually access via tags -
way faster regardless of where the function is, and the static
declarations confuse the tags generator.

>> - atexit() handlers - they always run when they aren't supposed to,
>>   causing bogus errors, wasting maintainer time.
>
> The handler was needed to ensure the database gets closed.  If that
> assurance is no longer necessary, it can be dropped.

We may rather need to fix the paths out of our software for those error
conditions that are recoverable and/or do not damage the database.

"make check" doesn't appear to run exit paths that fail to close the
databases, except perhaps t.bulkmode. You'll find out.

>> BTW, olddb takes the same time, also 6 s. No more performance
>> advantage without transactions for scoring. And no, I'm not going to
>> benchmark -u- it is the main cause for fast (almost explosively)
>> growing logs and slower operation.
>
> There are those of us who use 'u' and find it valuable.  Given all the
> hard work you've done to enable concurrent reads and writes, I'd think
> you'd like this feature since it exercises the new locking
> capability.

Not really, it'll rather exercise our error handling in ds_txn_commit()
and its callees.  It's all in the same transaction and actually somewhat
prone to deadlock avoidance TXN aborts.

We may need something similar to t.lock3 but for bogofilter -u to
provoke collisions and see if the programs properly retry the whole
"score/register" process. They do not do this at this time but will
rather exit with EX_ERROR and leave it to the user's setup to handle
this.

Which, BTW, makes me think again about the procmail/maildrop
examples. We should only give examples that defer mail when something's
wrong, hence no more

       :0HB:
       * ? bogofilter
       spam

examples. We want exit code 75 from procmail when bogofilter failed.
This has been standing for too long. The drawback is documented, but I
wonder if we should move the bogofilter -ep examples up front in the manual.

-- 
Matthias Andree



More information about the bogofilter-dev mailing list