bulk mode
    bogofilter at bobvincent.org 
    bogofilter at bobvincent.org
       
    Tue May  6 18:43:55 CEST 2003
    
    
  
On Tue, May 06, 2003 at 07:49:51AM -0400, David Relson wrote:
> At 04:28 AM 5/6/03, bogofilter at bobvincent.org wrote:
> 
> >On Mon, May 05, 2003 at 09:43:21PM -0400, David Relson wrote:
> >> No.  You can loop over files in a maildir and process them one at a
> >> time.  Bogofilter-0.12 has bulk mode switches ('-b' and '-B') which
> >> can be used to make maildir operations faster - assuming you can
> >> meaningfully process more than one file in a batch.
> >
> >Ah, but they only work for classifying messages, not for registering
> >them.
> >
> >Attempting to use the bulk mode switches when registering mail as
> >spam/nonspam results in a segfault.  (attempted file operation on null
> >file pointer).
> 
> bogofilter should _never_ segfault.  Is this a new discovery, or have you 
> known about it?  I'll take a look to see what's happening.
Does, dude.  Latest CVS checkout.  Took several hours with the
debugger to figure out why, though.  Been a long time.
>From bogoconfig.c:
        case 'b':
          bulk_mode = B_STDIN;
          fpin = NULL;        /* Ensure that input file isn't stdin */
          break;
>From main.c:
    if (run_type & (RUN_NORMAL | RUN_UPDATE)) {
      exitcode = classify(argc, argv,out);
    }
    else {
      register_messages(run_type);
      exitcode = 0;
    }
When registering spam, RUN_TYPE is 4
When registering nonspam, RUN_TYPE is 8.
Now follow through where main() calls register_message() which calls
collect_words() which calls get_token() ...
...
which eventually calls xfgetsl() which does the following:
if (feof(s))
  return (EOF);
which segfaults because along the way, "s" is a reference "fpin" which
is still NULL, and feof(0) is undefined.
So I patched it to read:
if (!s || feof(s))
  return (EOF);
but back in lexer.c,  the result gets assigned to the variable
"count", and we have:
if (count == -1) {
    if (ferror(fpin)
which segfaults again because ferror(0) is undefined.
Now back in main.c, we have this code:
        case B_STDIN:           /* '-b' - streaming (stdin) mode */
        {
            size_t len;
            filename = buff;
            if (fgets(buff, sizeof(buff), stdin) == 0) {
              done = true;
              continue;
            }
            len = strlen(filename);
            if (len > 0 && filename[len-1] == '\n')
                filename[len-1] = '\0';
                break;
            }
        }
But it's in classify.c, which never gets called.  There is no
equivalent code in the path that starts with register_messages().
So I figured it wasn't DESIGNED to operate in bulkmode when
registering messages, only when classifying them.
Silly me.
But occasionally, on this list and in the docs, I see references which
suggest (though obviously the author hasn't tried it) that you can
register a maildir of spam with something like:
cd Spam/cur ; ls | bogofilter -b -s
Sorry; that segfaults.
Interestingly enough, this one doesn't:
cd Spam/cur ; bogofilter -B -s `ls`
but it doesn't work, either, because bogofilter sits forever waiting
for an email message to appear on stdin.
    
    
More information about the bogofilter
mailing list