[patch] modified lexer interface
Mark M. Hoffman
mhoffman at lightlink.com
Thu Oct 3 19:37:57 CEST 2002
* David Relson <relson at osagesoftware.com> [2002-10-03 07:18:38 -0400]:
> At 01:13 AM 10/3/02, Gyepi SAM wrote:
> >On Wed, Oct 02, 2002 at 10:01:24PM -0700, Mark M. Hoffman wrote:
> > > The point of this is to allow the lexer to pass meta-tokens like
> > > "subject:money" and "received:foo at bar.com".
> >
> >Glad to see that. I was thinking about this...
> >
> > > I haven't included any such tokenizing yet... I just want to check
> > > that you're all OK with this much first.
> >
> >Fine with me.
> >
> >-Gyepi
>
> Mark,
>
> Can you let us in on your plans/thoughts?
<snip>
I assume by "us" you mean bogofilter instead of bogofilter-dev...
I would like to copy the advanced tokenizing features of spambayes into
bogofilter. E.g. they tokenize every word in the subject header as
"subject:Word"; they also downcase in general but allow mixed case in
the subject header... "subject:FREE" is much more damning than "subject:
free".
My very next small step will be to add the two states <HEADER> and <BODY>
to the lexer. After that, I want to pick through all the string/buffer
copying with a fine-tooth comb because I feel it's a little sloppy
right now (see Matthias' sourceforge bug entry). Then, I will start to
add tokenizing features. Who knows how much of this I'll get to, because
I'm moving 3000 miles across the US in a couple weeks. If I don't start
packing soon, my wife will kill me. ;)
Regards,
--
Mark M. Hoffman
mhoffman at lightlink.com
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list