lexer changes

David Relson relson at osagesoftware.com
Tue Nov 11 15:19:56 CET 2003


On Tue, 11 Nov 2003 14:59:01 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson wrote:
> 
> > I've tested your patch and "make check" FAILs many of the tests. 
> > The patch will not be applied.
> 
> Of course it fails, since new tokens are introduced. Without
> that it passes (attached).
> 
> >> 0) Allowing two-byte-tokens (see my test on the other list)
> > 
> > Not allowed, for reasons I've previously stated.
> 
> I cannot see a reason. Your test did not show the false
> positives, where this change brings the major improvement.
> Also targets are missed and different targets show one
> method ahead or the other. So I did a much more detailled test.

As has been said several times before, my test sets a fp target of 0.25%
(of the message count), finds the cutoff value that corresponds to that
target, scores the spam, and reports the false negative counts.  There's
no need to report the value for each test because the target is fixed at
0.25%

> >> 1) Some \ slipped back in. Out again.
> > 
> > None of them "slipped" in. 
> 
> Actually in
> <20031104092536.1d799059.relson at osagesoftware.com> you had
> some out which are now back in. Example:
> +TOKENBACK	[^[:blank:]<>;=():&%$#@+|/\\{}^\"?*,[:cntrl:][\]._+-]

Yes, I put them back in because "+" and "-" are special characters in
many flex constructs and having the backslashes will help avoid future
problems if the expressions are modified.

> > I'm satisfied with those that are there and
> > don't see a need for spending time discussing them or removing them.
> 
> I spent the time (three hours for cleanup alone) already.

And I've spent a lot of time as well.  

> >> 3) The comment of 09/01/03 does not fit the context. I
> >>    *don't* change this one.
> > 
> > Yes it does. 
> 
> "^[\?]" is not part of the pattern.

It's a typo.  I've fixed it.

> >> I cannot find the price range rule which is announced in the
> >> comment of 12 May 2003.
> > 
> > Hint: look for the word "dollar"
> 
> I only find that a token is returned. That token does not
> allow for any -..

So you want to include a minus sign???

> >> I am not sure about  MSG_COUNT	^\".MSG_COUNT\" -- are those
> >> \ needed?
> > 
> > Does it matter?  It works.  As they say, "if it ain't broke, don't
> > fix it."
> 
> It is confusing. And as you said yesterday, there is "code
> that _looks_ ok (on casual inspection) but is actually
> incorrect", so I try to understand if it is correct, not if
> it seems to work.

Try it.  Take a pristine lexer_v3.l, make sure it passes "make check",
then remove the quotes and see what happens.

> pi
> 




More information about the bogofilter-dev mailing list