[bogofilter-dev] [bugreports at nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]

David Relson relson at osagesoftware.com
Thu May 27 22:22:15 CEST 2004


On Thu, 27 May 2004 22:06:43 +0200
Soeren Sonnenburg wrote:

> On Wed, 2004-05-26 at 15:21, David Relson wrote:
> > On Wed, 26 May 2004 08:27:24 +0200
> > Soeren Sonnenburg wrote:
> [...]
> > An interesting message.  I don't have a complete answer yet, but
> > I've noticed a thing or two.
> > 
> > First,  what you sent looks like a mailbox file, except that it's
> > lacking the envelopes ("From who at wherever.com date") lines.  After I
> > added the envelopes to convert it to a standard .mbx file,
> > bogofilter worked great.  I've attached a patch that puts it in
> > standard form.
> 
> Actually my bogofilter usage pattern is to recreate its wordlist from
> scratch every once in a while, which is why I keep almost all spam
> (except for the one that gets high scores anyway)
> However, as I have all this in a Maildir like setup it would take a
> year to process when I would not do this find -print 0 | xargs -0
> bogofilter... trick. That is why 2 imperfect messages could be
> joined...

Hi Soeren,

Bogofilter understands maildirs.  Provide a dir (or multiple dirs) on
the command line.  Something like the following should work:

  bogofilter -s -B `find . -type d`

> Maybe one way would be to use formail or so to always enforce correct
> headers before final delivery to a Maildir, but that would be just a
> workaround IMO.

Concatenating messages that lack proper "From " envelopes isn't going to
work.  Bogofilter must have the "From " lines as separators in a mbox.

> > Second, bogofilter is filling its parsing buffer because it's trying
> > to complete a very long html tag.  The message contains 
> > 
> >   X-UIDL: DLJ!!PJ<!!8QS!!*l1!!
> > 
> > and the lexer sees the "<" and changes into html-tag mode. 
> > Unfortunately there's no matching ">" to end the tag, so bogofilter
> > says"Invalid buffer size, exiting." and quits.
> > 
> > I can force an EOF rather than quitting.  An EOF has the side effect
> > of ending _all_ processing of the file, so it could cause bogofilter
> > to stop processing a real mailbox.
> 
> That would be pretty bad as it would end scanning the whole mbox. why
> can't you just set a limit on the size of an html tag ? I don't know
> whether there are any official limits documented but it is probably a
> good idea to not make it larger than say 1k...

In general, the parsing component, specifically lexer_v3.l, knows
nothing about maximum lengths.  Processing unknown html tags uses an
HDISCARD state not presently known outside of lexer_v3.l.  Possibly a
flag can be set that will let bogofilter better handle b0rked messages.

I'll see what I can do :-)

> > Anyhow, I thought I'd bring you up to date while I determine the
> > best way to deal with this.
> 
> Thanks a lot.
> Soeren
> 
> > Regards,
> > 
> > David
> > 
> > P.S.  When sending a message that causes problems like this, it's
> > best to gzip the message.  That way the attached message doesn't
> > trigger bogofilter's processing.
> 
> ahh, bad I did not even think about these bad consequences...

A minor problem, not a big one :-)

Regards,

David



More information about the bogofilter-dev mailing list