significant digits?

David Relson relson at osagesoftware.com
Fri Apr 11 13:52:46 CEST 2003


At 07:27 AM 4/11/03, Jonathan Buzzard wrote:


>relson at osagesoftware.com said:
> > Calculations are done using variables of type "double", which varies
> > between architectures.  On Intel, a double is 8 bytes (64 bits).
> > According  to file float.h, a double has 15 decimal digits of
> > precision.  Bogofilter  uses as much as the hardware will give it.
>
>You miss the point. Because 0.95 cannot be represented precisely
>as a binary fraction, it represents a poor choice for a cut off value.
>My numerical analysis notes from university are quite adamant that
>floating point comparisons should be against a number that can be
>precisely represented as a binary fraction whenever possible.
>
>What happens is that you might get a spamicity that is just a bit
>less than the cutoff value of 0.95, but then gets classified as spam because
>0.95 is represented as a slightly smaller number internally. You then get
>puzzled people wondering why something with a spamicitiy less than the
>specified cutoff value of 0.95 cutoff value is classified as spam.
>
>It is a bit like using a step size of 0.1, though not as bad.
>
>JAB.

Jonathan,

Agreed, 0.95 cannot be exactly represented as a binary.  I think it's silly 
to worry about that.  Most decimal fractions are infinite repeating 
fractions when represented in binary just like most fractions are infinite 
repeating fractions when represented as decimals.

Exact binary representations are somewhat useful if you're doing equality 
checks.  They're minimally useful when doing inequality checks.  Bogofilter 
uses the spam_cutoff value is used to divide the results in two groups - 
those with scores above and those with scores below.   If we used a value 
that did have an exact binary representation, it wouldn't act much differently.

We could find an exact binary near to 0.95 - perhaps 97/128 is close enough 
- but what's the point?

You don't _have_ to use 0.95 as your spam_cutoff.  If you'd rather use 0.50 
or 0.75 or 0.875, just add the appropriate line to your bogofilter config file.

David





More information about the Bogofilter mailing list