Crm114 style context matching. Phrases and partial phrases.

Peter Bishop pgb at adelard.com
Sat May 17 21:29:45 CEST 2003


On 17 May 2003 at 20:19, bogofilter at aotto.com wrote:

> On 17 May 2003 at 7:22, Jef Poskanzer wrote:
> 
> > Neato.  For N=2 the number of tokens only doubles, and I bet the
> > sensitivity would still be significantly better than N-1.
> 
> It's hard to say what the increase will be 
> - the worst case is  W * W 
> where W is the number of unique words
> 

Actually, come to think of it, it is also unlikely because the mail
message would have to be an extremely weird one where each word appears 
many times over in all possible conbinations.

-- 
Peter Bishop 
Adelard and Centre for Software Reliability, City University
Drysdale Building, 10 Northampton Square, London, EC1V 0HB
Tel: +44-20-7490-9467, Fax: +44-20-7490-9451
pgb at adelard.com, http://www.adelard.com/
pgb at csr.city.ac.uk, http://www.city.ac.uk/





More information about the Bogofilter mailing list