token pairs [was: Algorithm limitations]
David Relson
relson at osagesoftware.com
Wed Apr 14 13:39:51 CEST 2004
On Wed, 14 Apr 2004 08:47:19 +0200
Boris 'pi' Piwinger wrote:
> David Relson <relson at osagesoftware.com> wrote:
>
> >Scanning the message happens as normal. Each token seen is returned
> >for scoring. Additionally, a token pair is created using each token
> >and its predecessor (with a colon separating them).
>
> There is the risk that this creates a token pair which looks
> like a tagged entry, since both use a colon.
>
> pi
True. The odds are low, but it can happen. One could use a leading
colon or a pair of colons, as in ":token:pair" and "token::pair", to
avoid the problem. As there are only a few tags in use, I think the
risk of a collision is pretty low. Also, since most tokens have fairly
neutral scores, getting an incorrect result because of a collision is
even smaller.
More information about the Bogofilter
mailing list