significant digits?

Jonathan Buzzard jonathan at buzzard.org.uk
Fri Apr 11 13:27:01 CEST 2003


relson at osagesoftware.com said:
> Calculations are done using variables of type "double", which varies
> between architectures.  On Intel, a double is 8 bytes (64 bits).
> According  to file float.h, a double has 15 decimal digits of
> precision.  Bogofilter  uses as much as the hardware will give it. 

You miss the point. Because 0.95 cannot be represented precisely
as a binary fraction, it represents a poor choice for a cut off value.
My numerical analysis notes from university are quite adamant that
floating point comparisons should be against a number that can be
precisely represented as a binary fraction whenever possible.

What happens is that you might get a spamicity that is just a bit
less than the cutoff value of 0.95, but then gets classified as spam because
0.95 is represented as a slightly smaller number internally. You then get
puzzled people wondering why something with a spamicitiy less than the
specified cutoff value of 0.95 cutoff value is classified as spam.

It is a bit like using a step size of 0.1, though not as bad.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan at buzzard.me.uk
Northumberland, United Kingdom.       Tel: +44 1661-832195






More information about the Bogofilter mailing list