Crm114 style context matching. Phrases and partial phrases.

Greg Louis glouis at dynamicro.on.ca
Sat May 17 12:49:11 CEST 2003


On 20030517 (Sat) at 0944:22 +0100, Anthony Clarke wrote:
> Hi,
> 
> I've hobbled together a preprocessing script which allows phrases and
> partial phrases to be categorised like crm114.
> 
> I don't think I have enough messages (1600 spam, 200 nonspam) to try out
> the tuning scripts and get some firm results for this.

I do.  I'd be very glad to evaluate your script if you care to let me
have a copy.

> The main disadvantage is that wordlists expand considerably.

They would, of course.  CRM114 trains exclusively on error for that
reason.  Performance becomes an issue too, I suspect.  But those are
challenges we might be able to deal with if the method looks like a
major win.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list