bogotune broken? Larger data* revision committed to CVS.

David Relson relson at osagesoftware.com
Sun Nov 21 00:46:49 CET 2004


On Sat, 20 Nov 2004 16:33:33 +0100
Matthias Andree wrote:

...[snip]....

> I'll try in different words, in case I was too tired to word this
> properly. We cannot use direct Berkeley DB file system access to
> access a database that is owned by a different user, not even for
> read-only access. We'll need to lock somebody else's file for the
> read, and while this may be possible with olddb, I could just place a
> lock on your file and thus prevent you from ever updating the
> database. Without locking, you may be reading half stale data, half
> updated, when the database owner commits a larger update.
> 
> We need to use a different means to access somebody else's database,
> and inter-process communication comes to mind. For instance RPC, FIFO
> or unix-domain or perhaps IP sockets.

Ah!  So the problem is an inherent limitation of locking.  Why not
document it, so bogofilter admins will know it and can choose to live
with the limitation.

Bogofilter only writes to 1 wordlist in any given execution.  When
dealing with multiple wordlists, only the first "regular", i.e.
non-ignore, list is opened with write permission.  All others are always
opened read-only.  I wonder if the code can take advantage of that
knowledge.

> >> - global variables, such as dbe, bogohome, word_lists, message
> >counts,>   robs, robx - the references are hard to chase, and there
> >is no>   technical reason to use these. (dbe is now gone, message
> >counts>   halfway).
> >
> > While there are well-known issues with global variables, moving them
> > all into parameters (or a structure that is passed to everyone)
> > isn't a whole lot better.  AFAICT it just changes the appearance of
> > the code, not the maintainability.
> 
> At least it allows for a good deal of pure functions (those that don't
> cause effects outside the parameters), which is one of the major
> reasons for my urging of getting rid of all globals. db_lock.c is
> "impure", accessing and changing globals, and while it simplifies the
> direct function interfaces, it is a PITA if we decide to allow
> multiple environments in the code.

Having pure functions is of value.  Putting lots of (formerly) global
variables into a structure to purify the functions doesn't seem
valuable.  I'd rather see accessor functions defined as inline
functions.  At least, grep could then be used to determined whether the
variable is being read or modified.

> > As the globals tend to have primary users, one idea is to declare
> > them in the appropriate file and implement accessor functions.  It
> > might be better, or it might not be.
> 
> I'm not in favor of accessor functions for static variables, we aren't
> gaining anything through them except additional complexity. We may as
> well do it right and allocate dynamically.

"right" and "dynamically" would be accurate if we were using an object
oriented language with class variables.  As we're not doing that,
whatever we have is a compromise -- and I can live with that fact.

> > As you say, declaring them isn't a necessity (except for the one
> > case). Not declaring them puts restriction on their placement in the
> > source code.  Declaring them gives a quick summary of what's in
> > the file and allows them to be ordered rationally, i.e.
> > alphabetically.
> 
> Presentation is a matter of the editor software. emacs's speedbar for
> one. I rarely access functions by scrolling, I usually access via tags
> - way faster regardless of where the function is, and the static
> declarations confuse the tags generator.

The compiler imposes its own ordering requirements.  For example, that's
why main() is usually at the end of a file even though it's the first
function to execute.  Including forward declarations allows functions to
be ordered as seems most logical or convenient.

> >> - atexit() handlers - they always run when they aren't supposed to,
> >>   causing bogus errors, wasting maintainer time.
> >
> > The handler was needed to ensure the database gets closed.  If that
> > assurance is no longer necessary, it can be dropped.
> 
> We may rather need to fix the paths out of our software for those
> error conditions that are recoverable and/or do not damage the
> database.

atexit() provides a poor man's form of exceptions.  'Tis too bad that
proper exception handling requires a language like C++ (which would cost
us portability).

> "make check" doesn't appear to run exit paths that fail to close the
> databases, except perhaps t.bulkmode. You'll find out.

I'll tweak bogofilter to let us know when (if) it reaches the exit
handler before closing the databases.

> >> BTW, olddb takes the same time, also 6 s. No more performance
> >> advantage without transactions for scoring. And no, I'm not going
> >to> benchmark -u- it is the main cause for fast (almost explosively)
> >> growing logs and slower operation.
> >
> > There are those of us who use 'u' and find it valuable.  Given all
> > the hard work you've done to enable concurrent reads and writes, I'd
> > think you'd like this feature since it exercises the new locking
> > capability.
> 
> Not really, it'll rather exercise our error handling in
> ds_txn_commit() and its callees.  It's all in the same transaction and
> actually somewhat prone to deadlock avoidance TXN aborts.
> 
> We may need something similar to t.lock3 but for bogofilter -u to
> provoke collisions and see if the programs properly retry the whole
> "score/register" process. They do not do this at this time but will
> rather exit with EX_ERROR and leave it to the user's setup to handle
> this.
> 
> Which, BTW, makes me think again about the procmail/maildrop
> examples. We should only give examples that defer mail when
> something's wrong, hence no more
> 
>        :0HB:
>        * ? bogofilter
>        spam
> 
> examples. We want exit code 75 from procmail when bogofilter failed.
> This has been standing for too long. The drawback is documented, but I
> wonder if we should move the bogofilter -ep examples up front in the
> manual.

Go for it!



More information about the bogofilter-dev mailing list