numerical problems

John Harper harper at utsc.utoronto.ca
Tue Apr 8 18:13:18 CEST 2003


I recently converted a version 0.7 database to version 0.11.1.7.
When bogofilter is processing some messages it simply goes away and
loops indefinitely.

Some debugging shows (after waiting a while to kill it):

Program received signal SIGINT, Interrupt.
0x000206c4 in gratio (a=0x40ae8, x=0x40af0, ans=0xffbef398, qans=0xffbef390, 
    ind=0x40c00) at dcdflib/src/dcdflib.c:1677
1677	    t *= (amn/ *x);
(gdb) where
#0  0x000206c4 in gratio (a=0x40ae8, x=0x40af0, ans=0xffbef398, 
    qans=0xffbef390, ind=0x40c00) at dcdflib/src/dcdflib.c:1677
#1  0x0001de48 in cumgam (x=0x40af0, a=0x40ae8, cum=0x40ef8, ccum=0x40f04)
    at dcdflib/src/dcdflib.c:419
#2  0x0001ddf4 in cumchi (x=0xffbef388, df=0xffbef380, cum=0xffbef398, 
    ccum=0xffbef390) at dcdflib/src/dcdflib.c:362
#3  0x0001da5c in cdfchi (which=0xffbef3a4, p=0xffbef398, q=0xffbef390, 
    x=0xffbef388, df=0xffbef380, status=0xffbef37c, bound=0xffbef370)
    at dcdflib/src/dcdflib.c:223
#4  0x00015964 in prbf (x=nan(0xfffffffffffff), df=210) at fisher.c:78
#5  0x00015a70 in fis_get_spamicity (robn=105, P=
      {mant = 4.1453536839921622e-121, exp = 0}, Q=
      {mant = -Infinity, exp = -12800}) at fisher.c:97
#6  0x0001544c in rob_compute_spamicity (wordhash=0xcca20, fp=0xffbef4b0)
    at robinson.c:249
#7  0x00015914 in rob_bogofilter (wordhash=0xcca20, fp=0x0) at robinson.c:343
#8  0x00013c14 in bogofilter (xss=0xffbef648) at bogofilter.c:73
#9  0x00013ebc in main (argc=260096, argv=0x0) at main.c:130


Those P and especially Q values in fis_get_spamicity don't look so
good. gratio is running around with NaN's.

I dumped the spam database file and found a number of entries with very
large counts, eg
$140 4294967294 20030408
which looked pretty suspect, so I grep'ed those out, created a new
spam db and things seem to work ok.

Now perhaps those numbers are an artifact of the conversion from the
old db (although some sanity checks in bogoutil/bogoupgrade would be
nice), but the fact that there are no checks before passing bad data
into the numerical routines is worrisome. If for any reason the db
gets corrupted bogofilter could just run off forever, and on a busy
system using it as a front-end spam filter that could be disastrous.

John Harper
------------------------------------
Academic Computing Coordinator
Computing and Networking Services
University of Toronto at Scarborough
harper at utsc.utoronto.ca




More information about the bogofilter-dev mailing list