[bogofilter-dev] [firstname.lastname@example.org: Bug#247434: bogofilter
segfaults with Invalid buffer size, exiting.]
relson at osagesoftware.com
Thu May 27 16:22:15 EDT 2004
On Thu, 27 May 2004 22:06:43 +0200
Soeren Sonnenburg wrote:
> On Wed, 2004-05-26 at 15:21, David Relson wrote:
> > On Wed, 26 May 2004 08:27:24 +0200
> > Soeren Sonnenburg wrote:
> > An interesting message. I don't have a complete answer yet, but
> > I've noticed a thing or two.
> > First, what you sent looks like a mailbox file, except that it's
> > lacking the envelopes ("From who at wherever.com date") lines. After I
> > added the envelopes to convert it to a standard .mbx file,
> > bogofilter worked great. I've attached a patch that puts it in
> > standard form.
> Actually my bogofilter usage pattern is to recreate its wordlist from
> scratch every once in a while, which is why I keep almost all spam
> (except for the one that gets high scores anyway)
> However, as I have all this in a Maildir like setup it would take a
> year to process when I would not do this find -print 0 | xargs -0
> bogofilter... trick. That is why 2 imperfect messages could be
Bogofilter understands maildirs. Provide a dir (or multiple dirs) on
the command line. Something like the following should work:
bogofilter -s -B `find . -type d`
> Maybe one way would be to use formail or so to always enforce correct
> headers before final delivery to a Maildir, but that would be just a
> workaround IMO.
Concatenating messages that lack proper "From " envelopes isn't going to
work. Bogofilter must have the "From " lines as separators in a mbox.
> > Second, bogofilter is filling its parsing buffer because it's trying
> > to complete a very long html tag. The message contains
> > X-UIDL: DLJ!!PJ<!!8QS!!*l1!!
> > and the lexer sees the "<" and changes into html-tag mode.
> > Unfortunately there's no matching ">" to end the tag, so bogofilter
> > says"Invalid buffer size, exiting." and quits.
> > I can force an EOF rather than quitting. An EOF has the side effect
> > of ending _all_ processing of the file, so it could cause bogofilter
> > to stop processing a real mailbox.
> That would be pretty bad as it would end scanning the whole mbox. why
> can't you just set a limit on the size of an html tag ? I don't know
> whether there are any official limits documented but it is probably a
> good idea to not make it larger than say 1k...
In general, the parsing component, specifically lexer_v3.l, knows
nothing about maximum lengths. Processing unknown html tags uses an
HDISCARD state not presently known outside of lexer_v3.l. Possibly a
flag can be set that will let bogofilter better handle b0rked messages.
I'll see what I can do :-)
> > Anyhow, I thought I'd bring you up to date while I determine the
> > best way to deal with this.
> Thanks a lot.
> > Regards,
> > David
> > P.S. When sending a message that causes problems like this, it's
> > best to gzip the message. That way the attached message doesn't
> > trigger bogofilter's processing.
> ahh, bad I did not even think about these bad consequences...
A minor problem, not a big one :-)
More information about the Bogofilter-dev