What's Coming ...

Peter Bishop pgb at adelard.com
Wed Nov 26 18:51:06 CET 2003


Re degeneration, why not keep this as a standard feature?

This is what Paul Graham does. Degeneration is used as a fallback if the 
precise format of the word is not found. I would have thought this was 
particularly useful for setups where the database is only partially updated 
by incoming spam/ham (e.g. train on error and train on uncertain)
as it is less likely that a specific word format will be stored in the 
database.

On 24 Nov 2003 at 19:59, Shawn Grunberger wrote:

> Number 4 in your remove list is "degeneration code for headers and case
> sensitivity." 
> 
> I'm aware of the -H option, which provides a degeneration option for header
> tagging. Was anything similar ever added for case-sensitivity? I'm only
> aware of the -Pi/PI option, which doesn't degenerate, right? 
> 
> We have hundreds of customers with case-insensitive databases that can't be
> rebuilt, so ideally we'd use something like -H for case that preserves
> accuracy during the transitional phase. 


-- 
Peter Bishop 
Adelard LLP and Centre for Software Reliability, City University
Drysdale Building, 10 Northampton Square, London, EC1V 0HB
Tel: +44-20-7490-9467, Fax: +44-20-7490-9451
pgb at adelard.com, http://www.adelard.com/
pgb at csr.city.ac.uk, http://www.city.ac.uk/





More information about the Bogofilter mailing list