Levenshtein distance as a useful pattern matching algorithm todecipher scrabble spam

Eric Wood eric at interplas.com
Sun Feb 20 03:22:49 CET 2005


From: "Chris Fortune"
> in Spam:
>     J at mes
>     j@/\/\e5
>     J4M3S
>     Jmaes
>     j_ at _m.ez
>     ...

But, you know, this list of spam token looks strangly like procmail recipes
or programming vaiables, etc, etc. which would probably not pass an
algorithm created in '65.

The only spam I get that passthrough bogofilter almost always contain a
junky domain to click on, ie. sg3e3.com.   Just anything random that the
spammer registered just for the pure purpose of his latest spam campaign.  I
call these "sqish" domains.  There so crappy, they don't even make since -
however, they work!

I wish bogofilter did a collaborative "razor" lookup with other bogofilter
systems to test on sqish domains as the only token for it to be ham or spam.
Everybody having to train for these sqish domain will be too slow to be
effective.  Besides, how can a spamish "sqish" domain token ever overweight
the ham-spew they through in or just picture content.

-eric wood




More information about the Bogofilter mailing list