Data Coherency [was: ...]
Gyepi SAM
gyepi at praxis-sw.com
Sun Sep 22 05:36:05 CEST 2002
On Sat, Sep 21, 2002 at 07:14:39PM -0400, David Relson wrote:
> At 04:36 PM 9/21/02, Mark Hoffman wrote:
> >Hi David:
> >
> >... [snip] ...
> >
> >I am thinking about a test driver that trains multiple instances of
> >bogofilter in an attempt to reproduce the data coherency problem with
> >message counts. Any thoughts on that? IIRC, that guy had a SMP box.
> >I can get one up and running here if I have to...
> I already sent Jeremy Blosser a patch that would use a DB token named
> ".count" to save the count.
I just tested that patch by running 10 simultaneous instances of bogofilter -s,
and indeed, the original problem had been solved. However, another problem manifests
itself: There is no atomicity to the updating of word and message counts so the multiple
instances logged many miscounts. Clearly, the locking mechanism is broken: while flock is preferred
to fcntl, I think the converse would be better since fcntl is more prevalent. (Stevens, APUE).
I understand that flock was chosen becuase it is supported by scripting languages, but then,
so is fcntl. In any case, I have corrected problem by using blocking read and write locks
using fcntl, which follows the multiple readers, single writer paradigm.
So far, it looks good, but I need more testing. I'll post the code as soon as it is done.
> getcount() would first look in the DB. If found, great; if not found, it'd
> read the file and output the value to the DB.
>
> putcount() would output the value to the DB.
We seem to be working on similar things here.
I have now implemented the same thing. Perhaps we need a code merge?
BTW, I called the token '.MSG_COUNT' just in case we start accepting words that begin with a period.
-Gyepi
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list