Markup.
Peter Bishop
pgb at adelard.com
Sat May 10 14:48:46 CEST 2003
On 9 May 2003 at 13:26, David Relson wrote:
> Like you, I wouldn't worry too much about it. The benefits seem pretty
> clear and there's always the occasional message that's virtually impossible
> to classify - even for a human.
A quick calculation suggests that false negatives dropped from 14% of spam
to ~13% of spam
- a useful improvement but not huge
It might be the case that removing casefolding would have a greater effect
Joerg Over did some tests on this but is hard to judge performance
as the tests were only on 33 spams, but at face value the results were:
Robinson:
3% false negatives drops to 0%
Fisher
90% false negatives drops to 6%
Clearly there are issues about Fisher performance
(was the database properly set up?)
but it looks like removing casefolding results in
a bigger reduction in false negatives than the checking
markup tokens (though more tests are needed).
Would it be possible to have a switch option to disable
case folding?
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list