[patch] modified lexer interface

Mark M. Hoffman mhoffman at lightlink.com
Thu Oct 3 19:37:57 CEST 2002


* David Relson <relson at osagesoftware.com> [2002-10-03 07:18:38 -0400]:

> At 01:13 AM 10/3/02, Gyepi SAM wrote:

> >On Wed, Oct 02, 2002 at 10:01:24PM -0700, Mark M. Hoffman wrote:

> > > The point of this is to allow the lexer to pass meta-tokens like
> > > "subject:money" and "received:foo at bar.com".
> >
> >Glad to see that. I was thinking about this...
> >
> > > I haven't included any such tokenizing yet... I just want to check
> > > that you're all OK with this much first.
> >
> >Fine with me.
> >
> >-Gyepi
> 
> Mark,
> 
> Can you let us in on your plans/thoughts?

<snip>

I assume by "us" you mean bogofilter instead of bogofilter-dev...

I would like to copy the advanced tokenizing features of spambayes into
bogofilter.  E.g. they tokenize every word in the subject header as
"subject:Word"; they also downcase in general but allow mixed case in
the subject header... "subject:FREE" is much more damning than "subject:
free".

My very next small step will be to add the two states <HEADER> and <BODY>
to the lexer.   After that, I want to pick through all the string/buffer
copying with a fine-tooth comb because I feel it's a little sloppy
right now (see Matthias' sourceforge bug entry).  Then, I will start to
add tokenizing features.  Who knows how much of this I'll get to, because
I'm moving 3000 miles across the US in a couple weeks.  If I don't start
packing soon, my wife will kill me. ;)

Regards,

-- 
Mark M. Hoffman
mhoffman at lightlink.com


For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list