bogofilter's default algorithm

Greg Louis glouis at dynamicro.on.ca
Wed Jan 22 12:42:34 CET 2003


On 20030121 (Tue) at 1937:19 -0800, Allyn Fratkin wrote:
> 
> i tried fisher over the last 3 or 4 days  and it misclassified 7
> spams that robinson would have caught.  again with default cutoffs.

The default cutoffs for Fisher need tweaking.  What's in there now is
what happened to work best for me several months ago when my training
db was still growing fast.

> my procmailrc loses the "unsure" classification and unsure messages get
> mapped into ham.  so it is likely that the 7 spams were unsure.
> but i don't know how to tweak the cutoffs and i suspect that many
> other users aren't interested in that either.

Shouldn't be using bogofilter at all then -- perhaps that's a bit too
strong, but there is simply no way to set a group of defaults that will
work well for everyone in every situation.

> my vote would be keeping robinson as the default algorithm because it
> seems to work better "out of the box" without tweaking.
  ^^^^^ (my emphasis)
Sheer luck for you.  Robinson-GM is even fussier about the spam cutoff
value than Robinson-Fisher, so if it happens to work decently for you
the way it's delivered, that must be because your email corpus is
similar in nature to the one that was used to determine the default
parameters for Robinson-GM.  If we took the trouble to tune
Robinson-Fisher's defaults to match what Robinson-GM is doing, nobody
would see any difference in fp or fn counts.  I think we ought to try,
because the present situation is obviously misleading people to believe
that one or the other "works better."  I'll have a go.

> [Those who] are interested in tweaking can switch to fisher and get
> better results (theoretically) but with more work involved.

You can't get better discrimination given a fixed training db. 
Robinson-GM can be tuned to do as well as Robinson-Fisher and vice
versa.  What you do win with Fisher is a clearer indication of whether
the decision is clear-cut or questionable; this, in turn, can make for
what I believe to be more efficient training.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the Bogofilter mailing list