Floating point errors?

David Relson relson at osagesoftware.com
Tue Jul 24 00:57:27 CEST 2007


On Mon, 23 Jul 2007 22:29:44 +0200 (CEST)
Pavel Kankovsky wrote:

> On Tue, 17 Jul 2007, Ingomar Wesp wrote:
> 
> > For some reason, when manually marking spam or ham, bogofilter was
> > always called with the -N and -S options respectively, even if the
> > message was not previously registered at all.
> 
> Ugh. Perhaps Bogofilter should provide some protection against this
> kind of mistake. Would it make sense to complain when a message that
> has never been registered is being unregistered? (It would be quite
> easy to implement imho: compute a hash of token list generated from
> the message, turn it into a quasitoken like .MSG_COUNT, increment its
> count during registration, check and decrement it during
> unregistration.)
> 
> > I assume that this lead to a condition where the individual spam
> > count of several tokens were larger than the overall spam message
> > count.
> 
> This is quite likely.

Hi Pavel,

My .MSG_COUNT are approx 550,000 and 140,000.  Adding a "dot"
token for each would add many, many tokens to my wordlist.  As I don't
believe I need them, this seems wasteful.  On the other hand, if you
(or someone else) wants to implement such a capability, it could be an
option.

When a ham/spam count exceeds .MSG_COUNT it's an indication that
something is b0rked.  Generating an error message might be
appropriate.  The idea results in a new issue -- how to make the
problem known when bogofilter is running in the background.

As a more modest proposal, checking each token's ham and spam counts
against .MSG_COUNT wouldn't use much computing power and might be
helpful...

Regards,

David




More information about the Bogofilter mailing list