bogofilter busy-loops

David Relson relson at osagesoftware.com
Thu May 29 14:20:07 CEST 2003


Hi Marek,

The backtrace looks familiar.  Because of the way bogofilter uses flex, the 
parsing scanner uses a fixed size buffer (8k or 16k, IIRC).  A long string 
of letters, without any punctuation will fill this buffer and cause a 
"scanner REJECT" message.  Since bogofilter is going to ignore tokens 
longer than MAXTOKENLEN, which is 30, the check_alphanum() routine was 
added to detect this condition so that bogofilter can discard enough of the 
unwanted characters.  This keeps the lexer from complaining.  This is new 
code and is being tweaked as I learn more about how the scanner interprets 
the character set.

Anyhow, to keep the story from going on forever, the routine is now called 
may_be_long_token() and is in file src/lexer.c.  There have been changes to 
it for the most recent release - bogofilter-0.13.3.  (Look in the NEWS for 
mention of "parser tweaks").  I suggest you update and see if the changes 
correct your problem.

If you still have trouble, gzip one of the messages and your bogofilter.cf 
(if you have one) and send them direct to me.  Also, include info on your 
setup.  What are your locale settings?  "set | grep LC_" will give me what 
I need.

David


At 05:20 AM 5/29/03, Marek Kowal wrote:
>Hi there,
>
>Within one hour four letters arrived at my system and all of them caused 
>bogofilter to busyloop. Those are emails with attachments (mpg files in 
>base64 encoding). I've tested it with serveral databases, but it busyloops 
>anyway. The problem occurs in version 0.13.0, it does not occur with 
>bogofilter-0.11.2.
>
>gdb localizes the busy-loop here:
>
>(gdb) bt
>#0  check_alphanum (buf=0x80a2199 'ÿ' <repeats 59 times>, count=59)
>     at lexer.c:68
>#1  0x0804ec32 in yyinput (buf=0x80a2199 'ÿ' <repeats 59 times>, 
>max_size=8192)
>     at lexer.c:267
>#2  0x0804fae5 in yy_get_next_buffer () at lexer_v3.c:5686
>#3  0x0804f935 in lexer_v3_lex () at lexer_v3.c:5520
>#4  0x080522e5 in get_token () at token.c:68
>#5  0x0804c5f5 in collect_words (wh=0x80a6170) at collect.c:48
>#6  0x08049741 in bogofilter () at bogofilter.c:67
>#7  0x08049c1f in classify () at main.c:287
>#8  0x080499aa in arg_foreach (hook=0x8049bf0 <classify>, argc=0,
>     argv=0xbffffd8c) at main.c:155
>#9  0x08049ac3 in main (argc=0, argv=0xbffffd8c) at main.c:205
>Haven't tested the thing against the latest version from CVS, but I 
>suspect the problem is still there, as nobody announced any changes to the 
>lexer along 0.13.0 and 0.13.3.
>
>I am rather reluctant to send the "broken" files directly to the list - 
>this is private correspondence, not belonging to me. Whom should I send 
>the files to directly?
>
>Cheers,
>Marek
>





More information about the bogofilter-dev mailing list