Bug reading mbox? (was: bogofilter-0.96.5 a.k.a. 1.0.0rc5)

Boris 'pi' Piwinger 3.14 at piology.org
Fri Nov 11 19:37:37 CET 2005


David Relson <relson at osagesoftware.com> wrote:

>> I see some problem with the new version. When doing a
>> (re)training session with bogominitrain.pl I first go
>> through the messages one by one to do the training and then
>> check the complete mbox in one run. Until the last version
>> it worked without a problem. Now suddenly, I get a lot of
>> mistakes (e.g. saying that I have 77 false negatives) when
>> checking them one by one there is no false negatives. So it
>> looks like -M is producing errors. Sorry, I did not have
>> time yet to check the details.
>
>Early unicode versions of bogofilter did the conversion to unicode
>before decoding (base64 or qp).  

I would not really know how to do that, but anyway:

>_That_ problem was corrected in 0.96.2.

Fine.

>However the fix revealed some problems when image attachments were
>decoded and run through iconv (for conversion to unicode).  Long ago
>changes were made so that bogofilter would skip binary attachments, but
>image and application attachments were overlooked.  _Those_ problems
>were fixed in 0.96.3.

Also fine.

>There have been no changes to '-M' or related code.  I suspect that
>what you're seeing now is the result of the skipping of binary
>attachments and the resulting change in tokens generated. 

Actually, my problem could also be described as follows:
bogofilter shows different bogosities when checking mails in
batch and in individual mode.

>More information is needed to be sure.

Sure. I'll run some tests.

>To see how new and old versions of bogofilter are scoring mailboxes,
>try the following:

Actually, I don't have different versions and this is not my
concern anyway. I am concerned about different ratings with
only one version.

Here is the first test:

$ bogofilter -o 0.7,0.2 -vM <spam|grep -v Spam
X-Bogosity: Unsure, spamicity=0.665304, version=0.96.5
X-Bogosity: Unsure, spamicity=0.643442, version=0.96.5
X-Bogosity: Unsure, spamicity=0.695795, version=0.96.5
X-Bogosity: Unsure, spamicity=0.612669, version=0.96.5

When I run each message individually, I get only spam. Also
note that I already double-checked that the number of
messages is correct.

On the other hand the following command returns no output:
$ formail -s bogofilter -o 0.7,0.2 -vM <spam|grep -v Spam

I'll try to narrow the file Spam down to some reasonable
test case.

pi



More information about the Bogofilter mailing list