odd missing word in R list
Suzanne Skinner
tril at igs.net
Tue Dec 17 20:26:33 CET 2002
On Mon, Dec 16, 2002 at 08:15:40AM -0500, David Relson wrote:
> It was a quick test to see if "len<!---xyzw-->ders" would come out as "len"
> and "ders" or as "lenders". I expected the worst result and got it.
I question whether this is really a bad thing. It strikes me that spammers are
shooting themselves in the foot with this ploy!
As mentioned previously, I've received a number of these spams and bogofilter
has been doing an excellent job of catching them. My spam db is filled with
gibberish half-words that occur only in spam. Here's a portion of the scoring
for one that came in just today:
ham# spam#
---- --
81 bui 0.00 11 0.999947 -9.84188 -0.00005
82 cts 0.00 11 0.999947 -9.84188 -0.00005
83 ght 0.00 11 0.999947 -9.84188 -0.00005
84 tion 0.00 12 0.999951 -9.92889 -0.00005
85 wei 0.00 12 0.999951 -9.92889 -0.00005
86 cle 0.00 17 0.999966 -10.27717 -0.00003
87 gua 0.00 18 0.999968 -10.33433 -0.00003
88 medi 0.00 18 0.999968 -10.33433 -0.00003
89 mory 0.00 18 0.999968 -10.33433 -0.00003
90 xual 0.00 18 0.999968 -10.33433 -0.00003
91 mus 0.00 20 0.999971 -10.43968 -0.00003
Or, as Paul Graham himself said:
So, as spammers start using "c0ck" instead of "cock" to evade simple-minded
spam filters based on individual words, Bayesian filters automatically notice.
Indeed, "c0ck" is far more damning evidence than "cock", and Bayesian filters
know precisely how much more.
In other words, let the Bayesian filter strut its stuff! No need to turn
ourselves into SpamAssassin by attempting ever-more-sophisticated heuristic
analysis.
Suzanne
--
tril at igs.net - http://www.igs.net/~tril/
A Pope has a Water Cannon. It is a Water Cannon.
He fires Holy-Water from it. It is a Holy-Water Cannon.
He Blesses it. It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it. It is a Wholly Holy Holy-Water Cannon.
He has it pierced. It is a Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive. He shoots them.
-- Principia Discordia
More information about the bogofilter-dev
mailing list