Bug reading mbox?

Sat Nov 12 14:46:45 CET 2005

On Thu, 10 Nov 2005 16:43:08 +0100
Boris 'pi' Piwinger wrote:

> Hi!
> 
> I see some problem with the new version. When doing a
> (re)training session with bogominitrain.pl I first go
> through the messages one by one to do the training and then
> check the complete mbox in one run. Until the last version
> it worked without a problem. Now suddenly, I get a lot of
> mistakes (e.g. saying that I have 77 false negatives) when
> checking them one by one there is no false negatives. So it
> looks like -M is producing errors. Sorry, I did not have
> time yet to check the details.
> 
> pi

For all you bogofilter users processing mailboxes, pi has detected a
parsing defect that affects bogofilter versions 0.96.3 through 0.96.5.
It's most likely to occur when registering or scoring a mailbox, but
can also occur when using bogofilter's '-b' and '-B' options.

The bug was introduced in 0.96.3..  When doing unicode processing,
lexer.c uses an extra text buffer (because the processing can increase
the byte count of the raw text).  On the other hand, when skipping a
binary attachment the extra buffer isn't needed.  The pointer for
determining which buffer to use was being handled incorrectly.  In some
cases, that leads to incorrect tokens being generated.

The fix is in CVS and will be in 0.96.6 (a.k.a. 1.0.0-rc6) later this
weekend.

Regards,

David