Levenshtein distance as a useful pattern matching algorithm to decipher scrabble spam

Chris Fortune cfortune at telus.net
Tue Aug 23 08:10:45 CEST 2005


> Hi Chris,
>
> My recollection is that token degeneration slows down bogofilter
> (by increasing the number of database lookups) and produced an
> insignificant difference in scoring effectiveness.
>
> Though its been a year or two since I wrote the degen code, I think I
> still have a copy of it (as a patch).  You're welcome to a copy of it,
> though I don't guarantee it'll apply against current bogofilter code.
>
> David
>

David,

Yes, please post the code, it will give me an idea of how to include the levenshtein function into bogofilter.  Can you quickly
review it and give me some quick tips (which files) on intercepting the db calls in the current bogofilter.  BTW, here is the
algorithm implemented in C  http://www.merriampark.com/ldc.htm

Chris





More information about the Bogofilter mailing list