Markup.

David Relson relson at osagesoftware.com
Sat May 10 16:09:28 CEST 2003


At 08:48 AM 5/10/03, you wrote:

>On 9 May 2003 at 13:26, David Relson wrote:
>
> > Like you, I wouldn't worry too much about it.  The benefits seem pretty
> > clear and there's always the occasional message that's virtually 
> impossible
> > to classify - even for a human.
>
>A quick calculation suggests that false negatives dropped from 14% of spam
>to ~13% of spam
>- a useful improvement but not huge
>
>It might be the case that removing casefolding would have a greater effect
>Joerg Over did some tests on this but is hard to judge performance
>as the  tests were only on 33 spams, but at face value the results were:
>
>Robinson:
>  3% false negatives drops to 0%
>
>Fisher
>  90% false negatives drops to 6%
>
>Clearly there are issues about Fisher performance
>(was the database properly set up?)
>but it looks like removing casefolding results in
>a bigger reduction in false negatives than the checking
>markup tokens (though more tests are needed).
>
>Would it be possible to have a switch option to disable
>case folding?

Peter,

If you want to do a significant test, I'll create a patch that allows 
case-folding to be disabled.

Let me know.

David






More information about the Bogofilter mailing list