Questions about spamicity
Greg Louis
glouis at dynamicro.on.ca
Fri May 30 01:10:10 CEST 2003
On 20030529 (Thu) at 1525:25 -0700, Michael Rensing wrote:
> Does it make sense for a message to have spamicity=0.000000? That's
> what's getting put into my message headers. As in:
>
> X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.13.2.1
Possibly. If you are using the Fisher evaluation, most nonspam will in
fact have scores under 0.0000005, and therefore round to zero with six
significant digits.
> It seems to me that for a statistical method, there should virtually
> never be a perfect 0 or 1 for a rating.
True. But the Fisher evaluation will rate most nonspams below
0.0000005 and most spams above 0.9999995, so they will _look_ like 0's
and 1's when you use the %f0.6 format.
> However, that's what I'm getting
> for all of my messages.
That, on the other hand, suggests a pathology in your training
database. You should certainly not be getting values near zero for
spam.
> When I run bogofilter -M -v against my spam mailbox, only a few have a
> non-zero spamicity. Any ideas what's going on? Do I need to reset the
> database somehow? If so, how?
You don't mention a few things we'd need to know to be able to help
effectively:
- what version of bogofilter are you running?
- how did you create your training database?
- how many spam and nonspam messages were used in that process?
- how did you decide what was spam and what was nonspam for purposes of
training?
I suspect the solution to your problem can be deduced from the answers
to those four questions (mainly the last 3 of course).
--
| G r e g L o u i s | gpg public key: finger |
| http://www.bgl.nu/~glouis | glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
More information about the Bogofilter
mailing list