Crm114 style context matching. Phrases and partial phrases.

Anthony Clarke anthony.c at mail.com
Sat May 17 13:06:01 CEST 2003


On Sat, May 17, 2003 at 06:49:11AM -0400, Greg Louis wrote:
> On 20030517 (Sat) at 0944:22 +0100, Anthony Clarke wrote:
> > Hi,
> > 
> > I've hobbled together a preprocessing script which allows phrases and
> > partial phrases to be categorised like crm114.
> > 
> > I don't think I have enough messages (1600 spam, 200 nonspam) to try out
> > the tuning scripts and get some firm results for this.
> 
> I do.  I'd be very glad to evaluate your script if you care to let me
> have a copy.
> 
> > The main disadvantage is that wordlists expand considerably.
> 
> They would, of course.  CRM114 trains exclusively on error for that
> reason.  Performance becomes an issue too, I suspect.  But those are
> challenges we might be able to deal with if the method looks like a
> major win.

OK, here's the script. I use it something like this in my procmail:

:0c
  | bogolexer -p | phraselexer.pl | bogofilter
  
Anthony.
  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: phraselexer.pl
Type: application/x-perl
Size: 355 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030517/a1c13296/attachment.bin>


More information about the Bogofilter mailing list