Levenshtein distance as a useful pattern matching algorithm todecipher scrabble spam

Eric Wood eric at interplas.com
Sat Feb 19 21:22:49 EST 2005


From: "Chris Fortune"
> in Spam:
>     J at mes
>     j@/\/\e5
>     J4M3S
>     Jmaes
>     j_ at _m.ez
>     ...

But, you know, this list of spam token looks strangly like procmail recipes
or programming vaiables, etc, etc. which would probably not pass an
algorithm created in '65.

The only spam I get that passthrough bogofilter almost always contain a
junky domain to click on, ie. sg3e3.com.   Just anything random that the
spammer registered just for the pure purpose of his latest spam campaign.  I
call these "sqish" domains.  There so crappy, they don't even make since -
however, they work!

I wish bogofilter did a collaborative "razor" lookup with other bogofilter
systems to test on sqish domains as the only token for it to be ham or spam.
Everybody having to train for these sqish domain will be too slow to be
effective.  Besides, how can a spamish "sqish" domain token ever overweight
the ham-spew they through in or just picture content.

-eric wood



More information about the Bogofilter mailing list