html_tokenizer
David Relson
relson at osagesoftware.com
Wed Feb 19 17:46:02 CET 2003
Nick,
It's looking like a job well done :-)
I've got a copy of bogolexer that uses html_tokenizer.l as an alternative
to the usual lexer_text_html.l. The '-j' switch determines whether your
code or the old code is used.
It looks good and I'd like to run "make check" with it. Unfortunately, the
new code returns additional tokens which will cause the regression tests to
complain. With a sample html message, the new code returns tokens from
"<body...>" and "<a ...>" tags. Unfortunately my quick tests didn't reveal
which part of the code was allowing the extras to get back to
bogolexer. Can you point me in the right direction to make them go away
(at least temporarily)?
Thanks.
David
--------------------------------------------------------
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
More information about the Bogofilter
mailing list