Short tokens and numbers
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Tue Nov 4 13:49:42 CET 2003
David Relson wrote:
>> Well, if I take them away it still builds and passes all
>> tests. I also cannot find a difference with *all* my mails.
>> So I assume, my changes are correct. I'm also cleaning up
>> the order to make it more easily readable.
>
> Please send a single patch with the changes you want. I'm not
> interested in evaluating two different, conflicting change sets.
As I said, the first is just cleaning up things. The second
changes what is seen as a token. So if you don't want to
change something use the first.
People who would like to see if smaller tokens and numbers
work for them can use the second patch.
> "make check" is a tool to detect regressions. It does not test all
> possible situations, so it's possible that changes to lexer_v3.l could
> be wrong and not show up in the test.
That's right. That's why I tested all my mails which were
all rated exactly the same.
> As to token length, lexer_v3.l has its maximum of 70 characters. As
> that's very old, I'm guessing it was done to "swallow" a complete base64
> line. In token.c, long tokens are discarded (base on the value of
> MAXTOKENLEN). That's likely what you've noticed.
I see. But actually even shorter tokens like
abcdefghijklmnopqrstuvwxyzabcde are ignored.
> Off-hand, I don't
> know what the effect would be if we change (or remove) the 70 in TOKEN
None. It is never used. The previous + swallows everything.
pi
More information about the Bogofilter
mailing list