possible message count corruption?

Adrian Otto aotto at aotto.com
Thu Sep 19 20:20:05 CEST 2002


Jeremy,

> As mentioned previously, we're running some basic stress tests using
> bogofilter to classify our entire incoming mail load as spam, and
> something  interesting happened.  Note that we're using 0.7.3, so
> it's possible this is known and fixed; we're looking seriously at
> deploying this in production within weeks, so we grabbed the latest
> version that was not explicitly tagged 'beta' on sf.net.

You should start using 0.7.4 today. I will be releasing it from beta this
afternoon. It has been stable, and there have been no serious bug reports.
We do have one bug which we will fix in an upcoming maintenance release:

http://sourceforge.net/tracker/index.php?func=detail&aid=609897&group_id=622
65&atid=499997

Please see my comments below about the problem you are seeing...

> Anyway.  For each message we're running:
> bogofilter -v >> bogofilter.log; bogofilter -s -v >> bogofilter.log
> This ran fine for most of yesterday, but then last night I was glancing at
> the log and saw this:
>
> % grep '^bogofilter:' bogofilter-log
> bogofilter: 4172 messages on the spam list
> ...
> bogofilter: 5294 messages on the spam list
> bogofilter: 5295 messages on the spam list
> bogofilter: 5296 messages on the spam list
> bogofilter: 5297 messages on the spam list
> bogofilter: 5298 messages on the spam list
> bogofilter: 1 messages on the spam list
> bogofilter: 2 messages on the spam list
> bogofilter: 3 messages on the spam list
> bogofilter: 4 messages on the spam list
> bogofilter: 5 messages on the spam list
> ...
> bogofilter: 13054 messages on the spam list
>
> The 'increment' lines around the point the message count reset don't
> indicate that the word counts themselves reset.  I don't believe the count
> reset before this, but I don't have a log going back to the beginning (as
> fast as it grows, I've been nuking the log every few hours; I was more
> interested in tracking the system load and db sizes and was mostly logging
> to artificially inflate the load).
>
> Anyway, if this is a new or interesting report let me know and I can
> provide more information.  The entire log from that period is 120MB
> unzipped, 12MB bzipped.

As far as I know, this is a new discovery. Is this something that you are
able to reproduce? If so, please post a bug on Sourceforge with steps to
reproduce the problem. This will be useful in addition to the raw log. Also,
set up an entry in syslog.conf to trap *.debug messages in a file. The lock
and unlock messages will be interesting. I just noticed that I'm missing a
call to openlog, so those messages are going unlabeled. I'll get a patch
into the next version for that.

You can upload your log file to http://aotto.com/incoming and I'll help
coordinate a solution once we figure out why this happens.

Also, if you can send me the config.h file that the configure script
generated, and the output of "uname -a", that will help as well. You can
respond off the list, and I'll post a summary when we figure this out.

Thanks,

Adrian


For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list