numerical problems

David Relson relson at osagesoftware.com
Tue Apr 8 23:46:39 CEST 2003


John,

There are _some_ safeguards in bogofilter to prevent a token's count from 
exceeding the .MSG_COUNT value.  Evidently something got overlooked.

In my quick test, which did include a count of 4294967294, everything came 
out right.  That's not too surprising as the test was quick and simple and 
the excess value check has worked for me in the past.  Undoubtedly your 
environment is different enough to cause a problem for bogofilter.

Do you still have the 0.7 databases and the message that gave you 
trouble?  Can you put them in a .tgz file and send them to me.

It would be a great help to me if you can send me as much relevant material 
as possible.  Please send it direct to me - there's no need to bother the 
mailing list with your data.

David

P.S.  The big numbers are likely the result of -S or -N causing a count to 
go negative.  That bug was fixed quite a while back.  Probably 0.7 is even 
older than the fix.

At 12:13 PM 4/8/03, John Harper wrote:

>I recently converted a version 0.7 database to version 0.11.1.7.
>When bogofilter is processing some messages it simply goes away and
>loops indefinitely.
>
>Some debugging shows (after waiting a while to kill it):
>
>Program received signal SIGINT, Interrupt.
>0x000206c4 in gratio (a=0x40ae8, x=0x40af0, ans=0xffbef398, qans=0xffbef390,
>     ind=0x40c00) at dcdflib/src/dcdflib.c:1677
>1677        t *= (amn/ *x);
>(gdb) where
>#0  0x000206c4 in gratio (a=0x40ae8, x=0x40af0, ans=0xffbef398,
>     qans=0xffbef390, ind=0x40c00) at dcdflib/src/dcdflib.c:1677
>#1  0x0001de48 in cumgam (x=0x40af0, a=0x40ae8, cum=0x40ef8, ccum=0x40f04)
>     at dcdflib/src/dcdflib.c:419
>#2  0x0001ddf4 in cumchi (x=0xffbef388, df=0xffbef380, cum=0xffbef398,
>     ccum=0xffbef390) at dcdflib/src/dcdflib.c:362
>#3  0x0001da5c in cdfchi (which=0xffbef3a4, p=0xffbef398, q=0xffbef390,
>     x=0xffbef388, df=0xffbef380, status=0xffbef37c, bound=0xffbef370)
>     at dcdflib/src/dcdflib.c:223
>#4  0x00015964 in prbf (x=nan(0xfffffffffffff), df=210) at fisher.c:78
>#5  0x00015a70 in fis_get_spamicity (robn=105, P=
>       {mant = 4.1453536839921622e-121, exp = 0}, Q=
>       {mant = -Infinity, exp = -12800}) at fisher.c:97
>#6  0x0001544c in rob_compute_spamicity (wordhash=0xcca20, fp=0xffbef4b0)
>     at robinson.c:249
>#7  0x00015914 in rob_bogofilter (wordhash=0xcca20, fp=0x0) at robinson.c:343
>#8  0x00013c14 in bogofilter (xss=0xffbef648) at bogofilter.c:73
>#9  0x00013ebc in main (argc=260096, argv=0x0) at main.c:130
>
>
>Those P and especially Q values in fis_get_spamicity don't look so
>good. gratio is running around with NaN's.
>
>I dumped the spam database file and found a number of entries with very
>large counts, eg
>$140 4294967294 20030408
>which looked pretty suspect, so I grep'ed those out, created a new
>spam db and things seem to work ok.
>
>Now perhaps those numbers are an artifact of the conversion from the
>old db (although some sanity checks in bogoutil/bogoupgrade would be
>nice), but the fact that there are no checks before passing bad data
>into the numerical routines is worrisome. If for any reason the db
>gets corrupted bogofilter could just run off forever, and on a busy
>system using it as a front-end spam filter that could be disastrous.
>
>John Harper
>------------------------------------
>Academic Computing Coordinator
>Computing and Networking Services
>University of Toronto at Scarborough
>harper at utsc.utoronto.ca
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-dev-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-dev-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-dev-help at aotto.com





More information about the bogofilter-dev mailing list