bulk mode
bogofilter at bobvincent.org
bogofilter at bobvincent.org
Tue May 6 18:43:55 CEST 2003
On Tue, May 06, 2003 at 07:49:51AM -0400, David Relson wrote:
> At 04:28 AM 5/6/03, bogofilter at bobvincent.org wrote:
>
> >On Mon, May 05, 2003 at 09:43:21PM -0400, David Relson wrote:
> >> No. You can loop over files in a maildir and process them one at a
> >> time. Bogofilter-0.12 has bulk mode switches ('-b' and '-B') which
> >> can be used to make maildir operations faster - assuming you can
> >> meaningfully process more than one file in a batch.
> >
> >Ah, but they only work for classifying messages, not for registering
> >them.
> >
> >Attempting to use the bulk mode switches when registering mail as
> >spam/nonspam results in a segfault. (attempted file operation on null
> >file pointer).
>
> bogofilter should _never_ segfault. Is this a new discovery, or have you
> known about it? I'll take a look to see what's happening.
Does, dude. Latest CVS checkout. Took several hours with the
debugger to figure out why, though. Been a long time.
>From bogoconfig.c:
case 'b':
bulk_mode = B_STDIN;
fpin = NULL; /* Ensure that input file isn't stdin */
break;
>From main.c:
if (run_type & (RUN_NORMAL | RUN_UPDATE)) {
exitcode = classify(argc, argv,out);
}
else {
register_messages(run_type);
exitcode = 0;
}
When registering spam, RUN_TYPE is 4
When registering nonspam, RUN_TYPE is 8.
Now follow through where main() calls register_message() which calls
collect_words() which calls get_token() ...
...
which eventually calls xfgetsl() which does the following:
if (feof(s))
return (EOF);
which segfaults because along the way, "s" is a reference "fpin" which
is still NULL, and feof(0) is undefined.
So I patched it to read:
if (!s || feof(s))
return (EOF);
but back in lexer.c, the result gets assigned to the variable
"count", and we have:
if (count == -1) {
if (ferror(fpin)
which segfaults again because ferror(0) is undefined.
Now back in main.c, we have this code:
case B_STDIN: /* '-b' - streaming (stdin) mode */
{
size_t len;
filename = buff;
if (fgets(buff, sizeof(buff), stdin) == 0) {
done = true;
continue;
}
len = strlen(filename);
if (len > 0 && filename[len-1] == '\n')
filename[len-1] = '\0';
break;
}
}
But it's in classify.c, which never gets called. There is no
equivalent code in the path that starts with register_messages().
So I figured it wasn't DESIGNED to operate in bulkmode when
registering messages, only when classifying them.
Silly me.
But occasionally, on this list and in the docs, I see references which
suggest (though obviously the author hasn't tried it) that you can
register a maildir of spam with something like:
cd Spam/cur ; ls | bogofilter -b -s
Sorry; that segfaults.
Interestingly enough, this one doesn't:
cd Spam/cur ; bogofilter -B -s `ls`
but it doesn't work, either, because bogofilter sits forever waiting
for an email message to appear on stdin.
More information about the Bogofilter
mailing list