further advice for asian spam and spam assassin text

David Relson relson at osagesoftware.com
Wed Sep 24 17:49:28 CEST 2003


On Wed, 24 Sep 2003 17:13:48 +0200
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> > ----- Forwarded message from XAEvxzl at iris.seed.net.tw -----
> 
> Please do *not* forware spam to this list. It pollutes the
> database.

pete,

What I received looked like "?M???S??(????30??1000??)????????".  As
question marks aren't accepted by bogofilter's parser as part of a
token, this parses as (roughly), "M", "S", "30", "1000".  None of these
are valid tokens because they're too short or all numeric.   So the
forwarded message looks pretty darn harmless.

When I want to include a spam message or mailbox, I gzip it knowing that
the binary encoded attachment will not bother bogofilter.


> 
> > Subject: *****SPAM***** ¥þ³¡¥X²M
>                           ~~~~~~~~
> 
> This could be used (eight non-ASCII characters in a row).
> 
> > ?M???S??(????30??1000??)????????
> [...]
> 
> This is pretty much what happens without charset declaration.

... might be readable by someone whose default charset is the same as
the incoming (undeclared) text.  A spammer who sends undeclared chinese
to someone expecting japanese has failed to provide a readable message. 
If Darwin is right, they won't survive in the spam business.

Peace,

David




More information about the Bogofilter mailing list