Floating point errors?
Ingomar Wesp
wesp at inode.at
Sun Jul 8 18:49:33 CEST 2007
Hello there.
I recently discovered that my bogofilter setup (bogofilter 1.1.3 on GNU/Linux)
stopped working properly. While classification was pretty good over the last
few years, bogofilter suddenly stopped detecting spam-mails - even if they
contained a whole lot of bad tokens.
In order to figure out what’s wrong, I looked at the output
of 'bogofilter -vvv' on a mail that was an obvious example of spam. This is
an abbreviated version of what I saw:
| X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.1.3
| n pgood pbad fw U
| "A0ML2L" 1 nan 0.000039 nan -
| "Aktien" 51 nan 0.001999 nan -
| "Anlageempfehlung" 25 nan 0.000980 nan -
| "Aufforderung" 27 nan 0.001058 nan -
| "Boerse" 26 nan 0.001019 nan -
| "Chartsanalyse" 1 nan 0.000039 nan -
| "Der" 331 nan 0.012972 nan -
| "Die" 537 nan 0.021045 nan -
| "Diese" 291 nan 0.011404 nan -
| "Frankfurt" 163 nan 0.006388 nan -
| "Gesellschaft" 63 nan 0.002469 nan -
| "hat" 664 inf 0.024650 0.000014 +
| "oder" 644 inf 0.015245 0.000014 +
| "fuer" 351 inf 0.012462 0.000026 +
It appears that proper classification fails due to a lack of floating point
precision in the calculation of the numerical values for pgood and fw. A
lookup of some of the tokens in my wordlist.db brings up the following:
| > bogoutil -w ~/.bogofilter Aktien Anlageempfehlung Frankfurt fuer
| spam good
| Aktien 51 0
| Anlageempfehlung 25 0
| Frankfurt 163 0
| fuer 318 33
From what I’ve figured out so far, the database does not appear to be broken.
Has anybody encountered a similar behaviour on his or her setup? Are there any
known fixes? And if not, does anybody know the exact formula that could bring
up these floating point errors?
Any advice would be very appreciated. Especially since manually sorting out
loads of spam is not a particularly entertaining task ;-)
So, thanks in advance and have a pleasant week,
Ingomar Wesp
--
____ )) _<http://ingomar.wesp.name/>_ .. ___________ ,^\\|//^. _______
( (( | ~~ | || //(-x-x-)\\ (
) (|~~| [#### ] | ====# [### ] | |~~| [# ] '|(,,^,,)|` )
(__ '==' ________|__ (_.._)_________| |__| ___________ .\,,,/. ____iw_(
More information about the Bogofilter
mailing list