A case for Markovian

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Fri May 14 16:23:59 CEST 2004


On Thu, 13 May 2004, David Relson wrote:

> Cons:  computing the hash costs time.  Hashes create possibilities of
> collisions.  Collisions can cause incorrect results, for example if both
> "computer" and "refinance" give the same hash code.

It would become next to impossible to search the database for tokens
having certain properties (e.g. all spammy tokens starting with "boo").
And even if the test did not rely on the token itself, you'd still get
a list of hashes rather than intelligible words.

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."




More information about the Bogofilter mailing list