[cvs] Potential for error?

David Relson relson at osagesoftware.com
Tue Oct 22 05:17:31 CEST 2002


At 11:02 PM 10/21/02, you wrote:

>>Similarly, one could periodically discard any tokens whose good+spam
>>count is 1.
>
>did you mean good=spam?  i think you would definitely
>want to keep a word that only appeared in one of the lists.

good+spam is my shorthand for adding together a token's counts from the 
good word list and from the spam word list.

good+spam=1 thus is a notation for a "singleton", i.e. a word that is so 
unusual (or incorrectly spelled) that it has been seen exactly 1 time in 
exactly 1 message while training bogofilter.  As a singleton it could be a 
word that is so unusual that it won't be seen ever again.  Given that train 
of thought, why keep it around?

David





More information about the Bogofilter mailing list