oddity.

David Relson relson at osagesoftware.com
Mon Apr 14 13:59:22 CEST 2003


At 01:12 AM 4/14/03, michael at optusnet.com.au wrote:


>In bogofilter.c:
>     /* tokenize input text and save words in a wordhash. */
>     do {
>         collect_words(&wordhash, &wordcount, &cont);
>         ++msgcount;
>     } while(cont);
>
>Shouldn't that be free'ing wordhash somewhere before it overwrites it?

bogofilter() is the message classification function.  Part of the 
expectation when classifying is that there is only one message per 
file.  There is a comparable loop in register_messages() and that's 
needed.  When registering messages, bogofilter expects multiple messages in 
a mbox formatted file (with "^From " separators between 
messages).  collect_words() returns when it encounters a message separator 
or an EOF.  In register_messages() the wordhashes from individual messages 
are collected in a cumulative wordhash and then added to the wordlist.

When classifying messages, the message body can contain lines starting with 
"^From ".  In a mailbox those message body lines are escaped (typically as 
">From ").  Without the loop, such a line would terminate parsing the message.

Summary:  it's correct.

>Secondly, calling bogofilter(double *) twice isn't safe. It gives
>different results depending on how often it's been called.  This is
>puzzling to me. Anyone know why that would be so??
>
>[root db]# (echo spam/8/1048940498 ; echo spam/8/1048945768) | bogofilter 
>-c /tmp/bogofilter.cf -d d/0 -v -b
>spam/8/1048940498 X-Bogosity: No, tests=bogofilter, spamicity=0.841708, 
>version=0.11.1.8
>spam/8/1048945768 X-Bogosity: No, tests=bogofilter, spamicity=0.637994, 
>version=0.11.1.8
>[root db]# (echo spam/8/1048945768 ; echo spam/8/1048940498) | bogofilter 
>-c /tmp/bogofilter.cf -d d/0 -v -b
>spam/8/1048945768 X-Bogosity: No, tests=bogofilter, spamicity=0.787672, 
>version=0.11.1.8
>spam/8/1048940498 X-Bogosity: No, tests=bogofilter, spamicity=0.821849, 
>version=0.11.1.8

Sounds like an uninitialized variable...  Evidently the first cut of the 
bulk_mode patch isn't quite right.  I'll take a look.

>most curious. (the '-b' flag just says to call bogofilter for each
>file on STDIN).

The bulk_mode patch provides two means for bogofilter to classify multiple 
files.  With '-B' the filenames are in the command line.  As the command 
line is of limited length, that limits the number of files that can be 
processed.  With '-b' the filenames are read from STDIN, perhaps by a "ls 
Maildir | bogofilter -b" command.  Using STDIN the number of files that can 
be processed is unlimited.


>Michael, reading source.
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-help at aotto.com





More information about the Bogofilter mailing list