filter evasion

Eric Wood eric at interplas.com
Fri Nov 7 14:21:43 CET 2003


This is something I put in just before the call to bogofilter:

# Strip Useless TABS and HTML comments, used to split up words
:0 HB
* !^From:.*spam at idomain\.com
* !^X-Loop:.*spam
* !^TOspam at domain\.com
* < 20000
* text/html
{
  :0fwb
  | expand | sed -e :a -e 's/<!--[^-]*-->//g;/</N;//ba'
}

# Use lynx to strip html and search /etc/vmail/spam_words
:0 HB
* !^From:.*spam at idomain\.com
* !^X-Loop:.*spam
* !^TOspam at domain\.com
* < 20000
* text/html
* ? lynx -dump -stdin | grep -i -f /etc/vmail/spam_words
{
  :0 fwh
  | formail -A"X-Loop: spam"
  :0
  ! spam at intgrp\.com
}


Now, I guess it would be better if you did this:
* ? lynx -dump -stdin | bogofilter -u
instead of:
* ? lynx -dump -stdin | grep -i -f /etc/vmail/spam_words


But I'm not an expert so I'll have to defer to a more knowledgable person.
Then again you might want to let bogofilter test if both ways:

* ? lynx -dump -stdin | bogofilter -u
{
  :0 fwh
  | formail -A"X-Loop: spam"
  :0
  ! spam at domain\.com
}

* ? bogofilter -u
{
  :0 fwh
  | formail -A"X-Loop: spam"
  :0
  ! spam at domain\.com
}





John McCain wrote:
> I'm seeing messages slip through that have the following
> characteristics:
>
> sp</ham>am
>
> Which tokenize as:
> sp
> ham
> am
>
> Previously, this sort of thing was done with html comments:
>
> sp<!--ham-->am
>
> Which would tokenize (in .15.8) as:
> spam
>
> What can be done about this?  I personally don't think it would be a
> great loss to simply ingore all html closing tags.  I can't think of
> any other HTML evil which could be perpetrated to do this any other
> sort of way without seriously disrupting the text.
>
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com





More information about the Bogofilter mailing list