lexer changes

Tue Nov 11 15:50:59 CET 2003

David Relson wrote:

> As has been said several times before, my test sets a fp target of 0.25%
> (of the message count), finds the cutoff value that corresponds to that
> target, scores the spam, and reports the false negative counts.  There's
> no need to report the value for each test because the target is fixed at
> 0.25%

If you closely look at my tests this produces more or less
random results. Example:

wo (fn):  0.500000    26     23     19     68
wo (fp):  0.500000     5      4      4     13
wi (fn):  0.581092    50     41     41    132
wi (fp):  0.581092     3      2      1      6
wi (fn):  0.499993    26     23     19     68
wi (fp):  0.499993     6      4      5     15
wi (fn):  0.457261    15     15     14     44
wi (fp):  0.457261    14     10      8     32

wo (fn):  0.500000    30     30     20     80
wo (fp):  0.500000     4      5      3     12
wi (fn):  0.546680    41     36     34    111
wi (fp):  0.546680     2      2      2      6
wi (fn):  0.499780    29     30     19     78
wi (fp):  0.499780     5      6      4     15
wi (fn):  0.457308    16     18     13     47
wi (fp):  0.457308    14     10      8     32

Which one is better? For 6 fp the second is better, for 15
fp and 32 fp the first. So you make your decision depending
on your choice of fp target.

In other of those test you see that there is a big
difference in the fp's in the first place. But if you shift
to some target you don't see that anymore.

wo (fn):  0.500000    22     18     24     64
wo (fp):  0.500000     5      4      6     15
wi (fn):  0.500248    22     18     24     64
wi (fp):  0.500248     5      4      5     14

wo (fn):  0.500000    24     22     22     68
wo (fp):  0.500000     4      4      3     11
wi (fn):  0.499999    24     22     21     67
wi (fp):  0.499999     5      4      4     13

You also see that the target was missed, there ain't no such
thing as a *fixed* target. So you actually may not compare
the same numbers.

>> >> 1) Some \ slipped back in. Out again.
>> > 
>> > None of them "slipped" in. 
>> 
>> Actually in
>> <20031104092536.1d799059.relson at osagesoftware.com> you had
>> some out which are now back in. Example:
>> +TOKENBACK	[^[:blank:]<>;=():&%$#@+|/\\{}^\"?*,[:cntrl:][\]._+-]
> 
> Yes, I put them back in because "+" and "-" are special characters in
> many flex constructs and having the backslashes will help avoid future
> problems if the expressions are modified.

+ is not special in a character class. Now we have two +
BTW. man page:
Note that inside of a character class, all regular expres-
sion operators lose their special  meaning  except  escape
('\') and the character class operators, '-', ']', and, at
the beginning of the class, '^'.

Actually - at the end or in the beginning is also fine.

>> >> I cannot find the price range rule which is announced in the
>> >> comment of 12 May 2003.
>> > 
>> > Hint: look for the word "dollar"
>> 
>> I only find that a token is returned. That token does not
>> allow for any -..
> 
> So you want to include a minus sign???

I'm note sure. I just read we do something with it and I
cannot find it.

pi