Spam / ham registration issue

Tom Anderson tanderso at oac-design.com
Wed Mar 3 14:38:02 CET 2004


On Wed, 2004-03-03 at 08:25, David Relson wrote:
> Pretty much.  The basic principle is comparing the likelihod of the word
> being in spam to the word being in ham.  You've maxed out both of them :-)

So registering _other_ hams and spams not having these tokens would tend
to have more effect than registering this same one over and over?

> An alternate view of the world would use message counts rather than
> percents of words in messages.  The alternate view could give us "I get
> 5 times as much spam as ham, so the odds are 5::1 that the next message is
> spam."

Although it sounds almost reasonable, it fails for the same reason as
racial profiling.  The innocent ones get harassed unduly.  Biasing 5:1
toward spam on each email would lead to an inordinate amount of false
positives.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040303/21477e64/attachment.sig>


More information about the Bogofilter mailing list