Data Coherency [was: ...]

Gyepi SAM gyepi at praxis-sw.com
Sun Sep 22 05:36:05 CEST 2002


On Sat, Sep 21, 2002 at 07:14:39PM -0400, David Relson wrote:
> At 04:36 PM 9/21/02, Mark Hoffman wrote:
> >Hi David:
> >
> >... [snip] ...
> >
> >I am thinking about a test driver that trains multiple instances of
> >bogofilter in an attempt to reproduce the data coherency problem with
> >message counts.  Any thoughts on that?  IIRC, that guy had a SMP box.
> >I can get one up and running here if I have to...

> I already sent Jeremy Blosser a patch that would use a DB token named 
> ".count" to save the count.

I just tested that patch by running 10 simultaneous instances of bogofilter -s,
and indeed, the original problem had been solved. However, another problem manifests
itself: There is no atomicity to the updating of word and message counts so the multiple
instances logged many miscounts. Clearly, the locking mechanism is broken: while flock is preferred
to fcntl, I think the converse would be better since fcntl is more prevalent. (Stevens, APUE).
I understand that flock was chosen becuase it is supported by scripting languages, but then,
so is fcntl.  In any case, I have corrected problem by using blocking read and write locks
using fcntl, which follows the multiple readers, single writer paradigm.
So far, it looks good, but I need more testing. I'll post the code as soon as it is done.


> getcount() would first look in the DB.  If found, great; if not found, it'd 
> read the file and output the value to the DB.
> 
> putcount() would output the value to the DB.

We seem to be working on similar things here.
I have now implemented the same thing. Perhaps we need a code merge?
 BTW, I called the token '.MSG_COUNT' just in case we start accepting words that begin with a period.


-Gyepi


For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list