massive false negatives

Peter Bishop pgb at adelard.com
Tue May 6 10:58:23 CEST 2003


On  at , Unknown wrote:

> Ever since I upgraded to 0.11.1.3, I've been getting a Lot of false
> negatives.  In fact, I'm not sure Anything is getting filtered out.
> 
> I recently recreated my db from known spam and ham, but that didn't
> appear to help.
> 
Looking at your attached listing I saw:

"bedroom"                           11  0.000734  0.000000  0.000038  -0.00004 -10.18522 +
"diploma"                           11  0.000734  0.000000  0.000038  -0.00004 -10.18522 +
"diplomas"                          11  0.000734  0.000000  0.000038  -0.00004 -10.18522 +
"imprisonment"                      11  0.000734  0.000000  0.000038  -0.00004 -10.18522 +
"walmart"                           11  0.000734  0.000000  0.000038  -0.00004 -10.18522 +

These words look very spammish to me.

But according the listing above, 11 good emails contained these words, 
and *zero* spam emails contained the words.

This looks very mch like "finger trouble" when you rebuilt the database
e.g. using -n when training with spam so the words are put on the wrong 
list.
or maybe using training files that have been "polluted" 
(e.g. the ham training set contains some spam)
-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list