Short tokens and numbers

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Nov 4 13:49:42 CET 2003


David Relson wrote:

>> Well, if I take them away it still builds and passes all
>> tests. I also cannot find a difference with *all* my mails.
>> So I assume, my changes are correct. I'm also cleaning up
>> the order to make it more easily readable.
> 
> Please send a single patch with the changes you want.  I'm not
> interested in evaluating two different, conflicting change sets.

As I said, the first is just cleaning up things. The second
changes what is seen as a token. So if you don't want to
change something use the first.

People who would like to see if smaller tokens and numbers
work for them can use the second patch.

> "make check" is a tool to detect regressions.  It does not test all
> possible situations, so it's possible that changes to lexer_v3.l could
> be wrong and not show up in the test.

That's right. That's why I tested all my mails which were
all rated exactly the same.

> As to token length, lexer_v3.l has its maximum of 70 characters.  As
> that's very old, I'm guessing it was done to "swallow" a complete base64
> line.  In token.c, long tokens are discarded (base on the value of
> MAXTOKENLEN).  That's likely what you've noticed.  

I see. But actually even shorter tokens like
abcdefghijklmnopqrstuvwxyzabcde are ignored.

> Off-hand, I don't
> know what the effect would be if we change (or remove) the 70 in TOKEN

None. It is never used. The previous + swallows everything.

pi





More information about the Bogofilter mailing list