[cvs] Potential for error?
Tom Allison
tallison at tacocat.net
Tue Oct 22 12:03:28 CEST 2002
David Relson wrote:
> At 11:02 PM 10/21/02, you wrote:
>
>>> Similarly, one could periodically discard any tokens whose good+spam
>>> count is 1.
>>
>>
>> did you mean good=spam? i think you would definitely
>> want to keep a word that only appeared in one of the lists.
>
>
> good+spam is my shorthand for adding together a token's counts from the
> good word list and from the spam word list.
>
> good+spam=1 thus is a notation for a "singleton", i.e. a word that is so
> unusual (or incorrectly spelled) that it has been seen exactly 1 time in
> exactly 1 message while training bogofilter. As a singleton it could be
> a word that is so unusual that it won't be seen ever again. Given that
> train of thought, why keep it around?
>
> David
What criteria do you use to remove it?
If I start using a new token, p0rn then it will eventually have great
importance. You have to give it some time to expire.
--
Scintillation is not always identification for an auric substance.
More information about the Bogofilter
mailing list