odd missing word in R list

Suzanne Skinner tril at igs.net
Tue Dec 17 20:26:33 CET 2002


On Mon, Dec 16, 2002 at 08:15:40AM -0500, David Relson wrote:

> It was a quick test to see if "len<!---xyzw-->ders" would come out as "len" 
> and "ders" or as "lenders".  I expected the worst result and got it.

I question whether this is really a bad thing. It strikes me that spammers are
shooting themselves in the foot with this ploy!

As mentioned previously, I've received a number of these spams and bogofilter
has been doing an excellent job of catching them. My spam db is filled with
gibberish half-words that occur only in spam. Here's a portion of the scoring
for one that came in just today:

                                 ham#       spam#
                                 ----        --
   81  bui                       0.00        11  0.999947  -9.84188  -0.00005
   82  cts                       0.00        11  0.999947  -9.84188  -0.00005
   83  ght                       0.00        11  0.999947  -9.84188  -0.00005
   84  tion                      0.00        12  0.999951  -9.92889  -0.00005
   85  wei                       0.00        12  0.999951  -9.92889  -0.00005
   86  cle                       0.00        17  0.999966  -10.27717  -0.00003
   87  gua                       0.00        18  0.999968  -10.33433  -0.00003
   88  medi                      0.00        18  0.999968  -10.33433  -0.00003
   89  mory                      0.00        18  0.999968  -10.33433  -0.00003
   90  xual                      0.00        18  0.999968  -10.33433  -0.00003
   91  mus                       0.00        20  0.999971  -10.43968  -0.00003

Or, as Paul Graham himself said:

   So, as spammers start using "c0ck" instead of "cock" to evade simple-minded
   spam filters based on individual words, Bayesian filters automatically notice.
   Indeed, "c0ck" is far more damning evidence than "cock", and Bayesian filters
   know precisely how much more.

In other words, let the Bayesian filter strut its stuff! No need to turn
ourselves into SpamAssassin by attempting ever-more-sophisticated heuristic
analysis.

Suzanne

-- 
tril at igs.net - http://www.igs.net/~tril/

A Pope has a Water Cannon.                               It is a Water Cannon.
He fires Holy-Water from it.                        It is a Holy-Water Cannon.
He Blesses it.                                 It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it.          It is a Wholly Holy Holy-Water Cannon.
He has it pierced.                It is a Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive.                                       He shoots them.
                                    -- Principia Discordia




More information about the bogofilter-dev mailing list