testing parsing changes

Sun Nov 9 12:14:27 CET 2003

David Relson <relson at osagesoftware.com> wrote:

>> Also interesting: What happens if you take you real
>> parameters, not the standard? This would show what really
>> happens.
>
>As stated, this is a test to determine whether the lexer changes
>contribute to improved parsing or not.  Most bogofilter users use the
>default parameters, so calling them "not real" is a mistake.

Maybe for some. Not so for others. But you generalize from
this one set of parameters.

>> Also here you don't give false positives. The target is not
>> guaranteed to work as expected, it can be a bit off due to
>> several messages with the same score (I have observed that
>> in tests). Also your target is way to high for my taste (it
>> is 1 false positive in 400 messages (in my case that would
>> be every other day!).
>
>I _do_ give false positive counts.  One of the test parameters is
>setting the false positive target (for ham) at 0.25% of the number of
>ham messages.  That count is used (along with the scores for the ham
>messages) to determine the cutoff value.  

And as we have seen this does not work precisely (and
cannot). Also you don't give them for the default parameters
where you said that most people would use them, so there it
would be crucial to see what happens here. Who cares if the
number of false negative only changes slightly, but the
number of false positives decreases be a few (which is a
large part).

>Once the cutoff value is
>determined, the spam messages are all scored and the number of false
>negatives is reported.

And this hides the real result. Maybe several messages are
now moved under the original cutoff, but you don't see it
because your target cutoff is far away, maybe an area where
not much happens.

>The test did not include your "numeric" change.  Once you start allowing
>a digit at the beginning of a token, then values line "110.2.43" go into
>the wordlist.  

Right.

>When I did a quick experiment with that lexer change, the
>quantity of numeric tokens was large and the tokens didn't appear to be
>important.  

In which sense? Inside mind_dev?

pi