wordlists and lexer

David Relson relson at osagesoftware.com
Tue Nov 23 02:53:01 CET 2004


On Tue, 23 Nov 2004 02:00:51 +0100
Matthias Andree wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> >> Because we're halfway in the middle of the changes. Documenting the
> >> limitation would be "you cannot use multiple wordlists", disable
> >the> --wordlist option, revert the two large commits and move on -
> >not an> option, as it seems.
> >
> > No.  It's "you cannot use multiple wordlists with transactions" due
> > to limitations in BerkeleyDB's environment.  That you can work
> > around the limitations is nice, but it's added complexity.
> 
> I think I'll review and back out the large database environment
> commits and replace the --wordlist option by a message "You cannot use
> multiple wordlists with Berkeley DB Transactional Data Store. See
> section X.Y in file README.abc for details." This is a larger task and
> it's bedtime now, and some of the fixes entailed in the larger updates
> we'll want to keep.

A test "if transactions and count(wordlists) > 1" would be fine.  I'll
add it if you don't care to.

...[snip]...
> 
> Yes, but crashing on bogus input is also beyond its scope. It runs in
> a mail environment, and any bug that is triggered by bogus input can
> also be triggered by a remote user. We absolutely must not allow
> SIGSEGV here.
> 
> I'll go fix this bug by just ignoring the line and reading the next
> token - someone else can then fix the lexer so we don't pass data down
> that beats the crap out of collect.c.

Bogofilter's purpose is to filter messages that have passed through a
MDA.  For those messages, it needs to be crash proof.

The message-count format, a.k.a. BOGO_LEX format, was created to speed
up testing by putting shifting the cost of database open/lookup to a
pre-processing pass.  Messages that have passed through a MDA always
have a message header (with keyword, colon, and value), don't they? Such
messages don't trigger BOGO_LEX mode.  As part of bogofilter's parsing,
its lexer can only enter BOGO_LEX mode if the first input line is in the
proper format.  

> >> I believe it only takes a tiny bug in lexer_v3.l to break nearly
> >every> assumption that the consumers make.
> >
> > No different from anywhere else in this program or any other
> > program.  A small error will always be able to cause a large
> > negative effect.
> 
> We cannot allow this in a mail scoring application. I don't mind if
> bogotune of bogoutil barfs on a non-crucial function once in a while,
> we can fix this after the bug report, but it poses no immediate danger
> to the end user's system.
> 
> Bogofilter itself must not crash, particularly not when running in
> some non-registering mode, else the mail system might be unable to
> make any progress with its duties - and parsers are crash-prone.

As explained above, this is not a concern.

> I fear 0.92.9 has already set out as a bugfix update. We'll see if
> something knocks on our doors.



More information about the bogofilter-dev mailing list