Bug reading mbox? (was: bogofilter-0.96.5 a.k.a. 1.0.0rc5)
Boris 'pi' Piwinger
3.14 at piology.org
Fri Nov 11 19:37:37 CET 2005
David Relson <relson at osagesoftware.com> wrote:
>> I see some problem with the new version. When doing a
>> (re)training session with bogominitrain.pl I first go
>> through the messages one by one to do the training and then
>> check the complete mbox in one run. Until the last version
>> it worked without a problem. Now suddenly, I get a lot of
>> mistakes (e.g. saying that I have 77 false negatives) when
>> checking them one by one there is no false negatives. So it
>> looks like -M is producing errors. Sorry, I did not have
>> time yet to check the details.
>
>Early unicode versions of bogofilter did the conversion to unicode
>before decoding (base64 or qp).
I would not really know how to do that, but anyway:
>_That_ problem was corrected in 0.96.2.
Fine.
>However the fix revealed some problems when image attachments were
>decoded and run through iconv (for conversion to unicode). Long ago
>changes were made so that bogofilter would skip binary attachments, but
>image and application attachments were overlooked. _Those_ problems
>were fixed in 0.96.3.
Also fine.
>There have been no changes to '-M' or related code. I suspect that
>what you're seeing now is the result of the skipping of binary
>attachments and the resulting change in tokens generated.
Actually, my problem could also be described as follows:
bogofilter shows different bogosities when checking mails in
batch and in individual mode.
>More information is needed to be sure.
Sure. I'll run some tests.
>To see how new and old versions of bogofilter are scoring mailboxes,
>try the following:
Actually, I don't have different versions and this is not my
concern anyway. I am concerned about different ratings with
only one version.
Here is the first test:
$ bogofilter -o 0.7,0.2 -vM <spam|grep -v Spam
X-Bogosity: Unsure, spamicity=0.665304, version=0.96.5
X-Bogosity: Unsure, spamicity=0.643442, version=0.96.5
X-Bogosity: Unsure, spamicity=0.695795, version=0.96.5
X-Bogosity: Unsure, spamicity=0.612669, version=0.96.5
When I run each message individually, I get only spam. Also
note that I already double-checked that the number of
messages is correct.
On the other hand the following command returns no output:
$ formail -s bogofilter -o 0.7,0.2 -vM <spam|grep -v Spam
I'll try to narrow the file Spam down to some reasonable
test case.
pi
More information about the Bogofilter
mailing list