[patch] small lexer changes

Mon Oct 7 23:35:34 CEST 2002

At 05:24 PM 10/7/02, Graham Wilson wrote:
>On Mon, Oct 07, 2002 at 04:46:22PM -0400, David Relson wrote:
> > lexer.c is long because of the list of html tags words that are recognized
> > and discarded.  Taking them out makes it much much shorter and is 
> something
>
>why are html tags taken out? i thought those would help identify a
>message as spam.

I can't answer "why" for you.  I can merely report what is so.

ESR wrote and released up through version 0.7.3 and I don't know his 
reasons.  I can speculate that he figured lexing them into oblivion would 
give faster and/or better results for bogofilter.

It is my belief that flex generates code to do very fast matches using a 
tree structured algorithm of some sort.  If this is correct, an ignore list 
would have to be very well implemented to compete on speed.