mbox v maildir problems.
michael at optusnet.com.au
michael at optusnet.com.au
Sat Sep 13 12:56:43 CEST 2003
I've got a little problem with 0.15.3 (it's
in earlier versions too).
I've got a lot of emails in a MH format
directory.
$ find dir -type f |grep ... | wc -l
24437
$ find dir -type f |grep ... | bogofilter -v -s -b
# 353626 words, 24477 messages
Bad. It somehow found an extra 40 messages somewhere!
The problem is that these files are in MH format so
'^From ' isn't escaped.
That's fine. That's why bogofilter has the '-M'
off by default.
BUT: bogoreader.c has
[...]
/* global reader initialization, exported */
void bogoreader_init(int _argc, char **_argv)
{
mailstore_first = mail_first = true;
reader_more = reader__next_mail;
fini = dummy_fini;
if (run_type & (REG_SPAM|REG_GOOD|UNREG_SPAM|UNREG_GOOD))
mbox_mode = true;
[...]
which forces mbox mode on while training! argh!!! :)
This is trivial fix. Those last two lines just shouldn't
be there. The problem is that this will change the behaviour
of bogofilter for people that are currently not passing '-M'
to bogofilter.
A half-way house would be to move those two lines innto
the B_NORMAL case of the switch following. (So we only
turn on mbox_mode when we're not doing bulk mode).
All options are horrible. :)
Comments?
Michael.
More information about the Bogofilter
mailing list