[cvs] Potential for error?

Tom Allison tallison at tacocat.net
Tue Oct 22 12:03:28 CEST 2002


David Relson wrote:
> At 11:02 PM 10/21/02, you wrote:
> 
>>> Similarly, one could periodically discard any tokens whose good+spam
>>> count is 1.
>>
>>
>> did you mean good=spam?  i think you would definitely
>> want to keep a word that only appeared in one of the lists.
> 
> 
> good+spam is my shorthand for adding together a token's counts from the 
> good word list and from the spam word list.
> 
> good+spam=1 thus is a notation for a "singleton", i.e. a word that is so 
> unusual (or incorrectly spelled) that it has been seen exactly 1 time in 
> exactly 1 message while training bogofilter.  As a singleton it could be 
> a word that is so unusual that it won't be seen ever again.  Given that 
> train of thought, why keep it around?
> 
> David

What criteria do you use to remove it?
If I start using a new token, p0rn then it will eventually have great 
importance.  You have to give it some time to expire.

-- 
Scintillation is not always identification for an auric substance.





More information about the Bogofilter mailing list