mbox v maildir problems.

michael at optusnet.com.au michael at optusnet.com.au
Sat Sep 13 12:56:43 CEST 2003


I've got a little problem with 0.15.3 (it's
in earlier versions too).

I've got a lot of emails in a MH format
directory.

$ find dir -type f |grep ... | wc -l 
 24437
$ find dir -type f |grep ... | bogofilter -v -s -b
# 353626 words, 24477 messages


Bad. It somehow found an extra 40 messages somewhere!
The problem is that these files are in MH format so
'^From ' isn't escaped.

That's fine. That's why bogofilter has the '-M'
off by default.

BUT: bogoreader.c has
[...]
/* global reader initialization, exported */
void bogoreader_init(int _argc, char **_argv)
{
    mailstore_first = mail_first = true;
    reader_more = reader__next_mail;
    fini = dummy_fini;
    if (run_type & (REG_SPAM|REG_GOOD|UNREG_SPAM|UNREG_GOOD))
        mbox_mode = true;
[...]

which forces mbox mode on while training! argh!!! :)

This is trivial fix. Those last two lines just shouldn't
be there. The problem is that this will change the behaviour
of bogofilter for people that are currently not passing '-M'
to bogofilter.

A half-way house would be to move those two lines innto
the B_NORMAL case of the switch following. (So we only
turn on mbox_mode when we're not doing bulk mode).

All options are horrible. :)

Comments?

Michael.




More information about the Bogofilter mailing list