Questions about spamicity

Greg Louis glouis at dynamicro.on.ca
Fri May 30 01:10:10 CEST 2003


On 20030529 (Thu) at 1525:25 -0700, Michael Rensing wrote:
> Does it make sense for a message to have spamicity=0.000000? That's
> what's getting put into my message headers. As in:
> 
> X-Bogosity:  No, tests=bogofilter, spamicity=0.000000, version=0.13.2.1

Possibly.  If you are using the Fisher evaluation, most nonspam will in
fact have scores under 0.0000005, and therefore round to zero with six
significant digits.

> It seems to me that for a statistical method, there should virtually
> never be a perfect 0 or 1 for a rating.

True.  But the Fisher evaluation will rate most nonspams below
0.0000005 and most spams above 0.9999995, so they will _look_ like 0's
and 1's when you use the %f0.6 format.

> However, that's what I'm getting
> for all of my messages.

That, on the other hand, suggests a pathology in your training
database.  You should certainly not be getting values near zero for
spam.

> When I run bogofilter -M -v against my spam mailbox, only a few have a
> non-zero spamicity. Any ideas what's going on? Do I need to reset the
> database somehow? If so, how?

You don't mention a few things we'd need to know to be able to help
effectively:
- what version of bogofilter are you running?
- how did you create your training database?
- how many spam and nonspam messages were used in that process?
- how did you decide what was spam and what was nonspam for purposes of
  training?

I suspect the solution to your problem can be deduced from the answers
to those four questions (mainly the last 3 of course).

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list