Detecting false-positives

Jef Poskanzer jef at mail.acme.com
Sun May 22 22:45:51 CEST 2005


>How do those of you detect false-positives?
>
>Especially those of you with high volumes of individual mail?  
>
>As you can see in this plot, there are periods where I get well 
>over 1,000 messages each day, and it really defeats the purpose 
>of filtering if I have to check 600-700 messages per day for 
>false positives :)

I get about 2,000,000 messages per day, but only about 25,000/day
make it as far as the Bayesian layer, where I use bogofilter and qsf
combined in a voting arrangement.  97% of those score the max for
both filters; those I send directly to the bit bucket.  That leaves
about 700/day, of which 85% or about 600/day are classified as spam.

These numbers are much higher that usual right now due to all the
Nazi spam from sober.q.  Before that it was about 1/10th as much.

Anyway, the spam-but-not-max-spam messages get filed in a folder,
and once or twice a day I scan the subjects.  Maybe once a week
I find a false positive in there, typically someone I've never
corresponded with before.
---
Jef

       Jef Poskanzer  jef at mail.acme.com  http://www.acme.com/jef/



More information about the Bogofilter mailing list