Markup.

Peter Bishop pgb at adelard.com
Sat May 10 14:48:46 CEST 2003


On 9 May 2003 at 13:26, David Relson wrote:

> Like you, I wouldn't worry too much about it.  The benefits seem pretty 
> clear and there's always the occasional message that's virtually impossible 
> to classify - even for a human. 

A quick calculation suggests that false negatives dropped from 14% of spam 
to ~13% of spam
- a useful improvement but not huge

It might be the case that removing casefolding would have a greater effect
Joerg Over did some tests on this but is hard to judge performance
as the  tests were only on 33 spams, but at face value the results were:

Robinson:
 3% false negatives drops to 0% 
 
Fisher
 90% false negatives drops to 6%

Clearly there are issues about Fisher performance
(was the database properly set up?)
but it looks like removing casefolding results in
a bigger reduction in false negatives than the checking
markup tokens (though more tests are needed).

Would it be possible to have a switch option to disable
case folding? 

-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list