Understanding lexer_v3.l changes

Boris 'pi' Piwinger 3.14 at piology.org
Sun Nov 26 20:20:25 CET 2006


David Relson <relson at osagesoftware.com> wrote:

>> >> >It allows dots within IDs.
>> >
>> >ID formats vary between mail programs.  Allowing dots increases the
>> >set of acceptable IDs.  As we know, increasing the set of tokens can
>> >be both useful and unnecessary.
>> 
>> Right. AFAICS only one rule uses it:
>> :<INITIAL>\n?[[:blank:]]id{WHITESPACE}+{ID}      { return QUEUE_ID; }
>> 
>> The \n? does not seem to have any function. But I do see
>> dots in IDs, so I will also add it.
>
>Not certain, but it's likely there to help line folding

That was the idea, but since it is not required I would
assume also "something else id abc" would match.

>People running bogofilter after a major database problem have shown
>that even after removing any _one_ of the following token groups:
>
>   tokens beginning with lower case letters
>   tokens beginning with upper case letters
>   tokens beginning with a-m
>   tokens beginning with n-z
>
>bogofilter still functions well.  Of course, it works better when
>they're _all_ present.
>
>If you remember, early on bogofilter converted upper case to lower
>case.  Disabling that increased the wordlist size and bogofilter's
>accuracy.

Right. I am more concerned about additional constructions
which are intuitive, but not significant in the end.

pi



More information about the Bogofilter mailing list