filter evasion

David Relson relson at osagesoftware.com
Fri Nov 7 00:30:36 CET 2003


On Thu, 6 Nov 2003 15:10:05 -0600
John McCain <jmccain at layer3al.com> wrote:

> I'm seeing messages slip through that have the following
> characteristics:
> 
> sp</ham>am
> 
> Which tokenize as:
> sp
> ham
> am
> 
> Previously, this sort of thing was done with html comments:
> 
> sp<!--ham-->am
> 
> Which would tokenize (in .15.8) as:
> spam
> 
> What can be done about this?  I personally don't think it would be a
> great loss to simply ingore all html closing tags.  I can't think of
> any other HTML evil which could be perpetrated to do this any other
> sort of way without seriously disrupting the text.

John,

Is this happening in text labeled as html or as plain text?  If it's
html text, the "<!--ham-->" should be ignored and bogofilter sees
"spam".  In plain text, bogofilter will divide the input according to
special characters, see "sp", "ham", and "am" -- and ignore the
character pairs because it ignores tokens shorter than three characters.
 If you're seeing behavior other than I've described, please gzip the
original message and send it to me.

By the way, are you _really_ seeing tokens "sp" and "am"?  If so, are
you running pi's lexer?

Thanks.

David




More information about the Bogofilter mailing list