[patch] small lexer changes

Mon Oct 7 23:25:56 CEST 2002

On Mon, Oct 07, 2002 at 04:46:22PM -0400, David Relson wrote:
> At 04:24 PM 10/7/02, Matthias Andree wrote:
> >I'm just wondering if we could reduce the lexer.c code size without
> >sacrificing too much speed. lexer.c is >46,000 lines here, which pretty
> >much stinks.
> 
> lexer.c is long because of the list of html tags words that are recognized 
> and discarded.  Taking them out makes it much much shorter and is something 
> I've done while testing its parsing.  However that would likely cost time 
> during analysis - all those words to lookup in the word lists.

We could also move the html words from lexer.c into the ignore list, once it
is implemented. That may even be better since people who don't want to exclude html keywords
can simply delete them from the ignore list.

Presuming that lexer.c uses a linear search, putting them on the ignore list *may* be faster.

-Gyepi