Spam / ham registration issue

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Mar 3 14:58:40 CET 2004


Tom Anderson wrote:

>> Pretty much.  The basic principle is comparing the likelihod of the word
>> being in spam to the word being in ham.  You've maxed out both of them :-)
> 
> So registering _other_ hams and spams not having these tokens would tend
> to have more effect than registering this same one over and over?

Absolutely. I once reportet the seeming oddity that when you
register another spam message, another spam message might
now no longer be rated as spam as it did before. This is
because of that effect.

>> An alternate view of the world would use message counts rather than
>> percents of words in messages.  The alternate view could give us "I get
>> 5 times as much spam as ham, so the odds are 5::1 that the next message is
>> spam."
> 
> Although it sounds almost reasonable, it fails for the same reason as
> racial profiling.  The innocent ones get harassed unduly.  Biasing 5:1
> toward spam on each email would lead to an inordinate amount of false
> positives.

Most likely. So Bogofilter does not follow this road.

pi




More information about the Bogofilter mailing list