fatal flex scanner internal error--end of buffer missed
David Relson
relson at osagesoftware.com
Thu Sep 4 18:27:01 CEST 2003
On Thu, 04 Sep 2003 11:05:23 -0500
"Karl O. Pinc" <kop at meme.com> wrote:
>
> On 2003.09.04 07:25 David Relson wrote:
> > Karl,
> >
> > I've looked at all 5 of the messages. Each begins with a normal
> > "From"
> > line, followed by normal message headers, followed by a normal body,
> > followed by additional "Status: RO", "Content-Length:", and "Lines:"
> > header lines. These messages are unusual. I'm not sure whether
> > they comply with the standards are not. What's their origin?
> >
> > For example, in #19041 lines 37 to 82 are base64 encoded text.
> > Lines 83
> > to 85 are:
> >
> > Status: RO
> > Content-Length: 6224
> > Lines: 157
>
> Please, no need to apologize. Y'all are doing _me_ a favor with all
> the work you've done. (And I've worked around the problem.)
>
> I rebuilt from the srpm just on principal, no worries about libraray
> compability etc. (If there's a build requirement autoconf doesn't
> grok you can always use the "Build-Requires:" specfile tag to avoid
> problems.)
> (I find "rpm --rebuild" the best idiom for installing software not
> specific
> to my distro release.)
>
> (FYI: rpm -q flex --> flex-2.5.4a-1)
>
> The messages are from my saved spam mbox. I found them while
> training. Very likely these are not standards conformant messages.
> I've been collecting spam for years and have used various mail
> clients, of late I find I can't weed myself from the GUI and am using
> balsa (at the moment balsa-1.2.4-7.7.2 but have used older versions.)
> I suspect the client has sometimes corrupted the mailbox. Maybe when
> they get really large like my latest spam box (~160MB). I noticed
> quite a few
> (20?) corrupted messages while carefully cleaning my spam corpus
> (~30,000
> messages.) I wouldn't think they _all_ were bad on arrival. I tried
> to simply delete non-conformant messages when I came across them.
>
> I _have_ seen some non-conformant spams arrive 'tho, I suspect
> straight from a spammer with faulty software. I'd think it'd be nice
> to be able to handle them. No way to trap an exception -- a-la strace
> if nothing else? :( (Gosh, haven't thought of language hacking in a
> while.)
>
> Anyhow, not a big deal. Although come to think of it I'm using the
> procmail
> recepie from man 1 bogofiler which rejects the delivery on error, so
> that
> might get me some sort of a loop should there be a failure. (labeled:
> # filter mail through bogofilter, tagging it as spam and
> # updating the wordlists
> )
Karl,
Bogofilter's goal is to handle standards compliant messages. As we
discover the ways that spammers deviate from the standard (and that are
accepted by popular MUA's), we "loosen" bogofilter's interpretations. A
small number of non-compliant constructs are already understood.
Whether a message is compliant or not, bogofilter should never abort.
With procmail, an abort causes a retry, cause an abort, retry, ...
Since writing to you earlier today, I dug into flex and found
YY_FATAL_ERROR, which calls yy_fatal_error(), which prints the message
and aborts. I now have a new definition of YY_FATAL_ERROR which uses
setjmp/longjmp. This at least allows bogofilter to score the message up
to the problem area and will lessen the problems.
David
More information about the Bogofilter
mailing list