The concept of using lex to parse comments and html tags out of html...

Nick Simicich njs at scifi.squawk.com
Tue Feb 18 03:29:39 CET 2003


At 09:05 AM 2003-02-17 -0500, David Relson wrote:

>Nick,
>
>You've been quietly busy :-)  I was wondering what you were up to.  Now I 
>know.
>
>A quick test with a randomly chosen hunk of html show that the tokenizer 
>does _something_.  Now it's time to evaluate and learn more about what it 
>really does.
>
>By the way, since the 40 line definition of YY_INPUT makes it hard to step 
>through in the debugger,  I created a function yyinput().  The code now 
>looks like:

Why would you need to debug that code?  It all worked the first time.   At 
least all that I tested.  I expected it all to work, it is all simple step 
through the buffer and stash a little as needed.

>#define YY_INPUT(buf,result,max_size) result=yyinput(buf,max_size)
>int yyinput(char *buf, int max_size)
>{
>... [your code, without the trailing backslashes] ..
>return result;
>}

This should work as well.  By the way, all of the code that actually read 
from the file was a straight copy from the original YY_INPUT macro.   Once 
you are through testing, you probably want to de-merge this - you have a 
YY_INPUT scheme that already works.

>Let me spend some more time with this.  I'll try merging it into a test 
>bogofilter and see what happens.
>
>Good Work!!!!

That you may want to see about.


>David
>
>
>

--
SPAM: Trademark for spiced, chopped ham manufactured by Hormel.
spam: Unsolicited, Bulk E-mail, where e-mail can be interpreted generally 
to mean electronic messages designed to be read by an individual, and it 
can include Usenet, SMS, AIM, etc.  But if it is not all three of 
Unsolicited, Bulk, and E-mail, it simply is not spam. Misusing the term 
plays into the hands of the spammers, since it causes confusion, and 
spammers thrive on  confusion. Spam is not speech, it is an action, like 
theft, or vandalism. If you were not confused, would you patronize a spammer?
Nick Simicich - njs at scifi.squawk.com - http://scifi.squawk.com/njs.html
Stop by and light up the world!



More information about the Bogofilter mailing list