html_tokenizer

Wed Feb 19 17:46:02 CET 2003

Nick,

It's looking like a job well done :-)

I've got a copy of bogolexer that uses html_tokenizer.l as an alternative 
to the usual lexer_text_html.l.  The '-j' switch determines whether your 
code or the old code is used.

It looks good and I'd like to run "make check" with it.  Unfortunately, the 
new code returns additional tokens which will cause the regression tests to 
complain.  With a sample html message, the new code returns tokens from 
"<body...>" and "<a ...>" tags.  Unfortunately my quick tests didn't reveal 
which part of the code was allowing the extras to get back to 
bogolexer.  Can you point me in the right direction to make them go away 
(at least temporarily)?

Thanks.

David
--------------------------------------------------------
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800