Understanding lexer_v3.l changes
Boris 'pi' Piwinger
3.14 at piology.org
Sun Nov 26 20:20:25 CET 2006
David Relson <relson at osagesoftware.com> wrote:
>> >> >It allows dots within IDs.
>> >
>> >ID formats vary between mail programs. Allowing dots increases the
>> >set of acceptable IDs. As we know, increasing the set of tokens can
>> >be both useful and unnecessary.
>>
>> Right. AFAICS only one rule uses it:
>> :<INITIAL>\n?[[:blank:]]id{WHITESPACE}+{ID} { return QUEUE_ID; }
>>
>> The \n? does not seem to have any function. But I do see
>> dots in IDs, so I will also add it.
>
>Not certain, but it's likely there to help line folding
That was the idea, but since it is not required I would
assume also "something else id abc" would match.
>People running bogofilter after a major database problem have shown
>that even after removing any _one_ of the following token groups:
>
> tokens beginning with lower case letters
> tokens beginning with upper case letters
> tokens beginning with a-m
> tokens beginning with n-z
>
>bogofilter still functions well. Of course, it works better when
>they're _all_ present.
>
>If you remember, early on bogofilter converted upper case to lower
>case. Disabling that increased the wordlist size and bogofilter's
>accuracy.
Right. I am more concerned about additional constructions
which are intuitive, but not significant in the end.
pi
More information about the Bogofilter
mailing list