The concept of using lex to parse comments and html tags out of html...

David Relson relson at osagesoftware.com
Mon Feb 17 15:05:47 CET 2003


Nick,

You've been quietly busy :-)  I was wondering what you were up to.  Now I know.

A quick test with a randomly chosen hunk of html show that the tokenizer 
does _something_.  Now it's time to evaluate and learn more about what it 
really does.

By the way, since the 40 line definition of YY_INPUT makes it hard to step 
through in the debugger,  I created a function yyinput().  The code now 
looks like:

#define YY_INPUT(buf,result,max_size) result=yyinput(buf,max_size)
int yyinput(char *buf, int max_size)
{
... [your code, without the trailing backslashes] ..
return result;
}

Let me spend some more time with this.  I'll try merging it into a test 
bogofilter and see what happens.

Good Work!!!!

David





More information about the Bogofilter mailing list