filter evasion

Janne Nikula jni at sdf-eu.org
Fri Nov 7 07:26:36 CET 2003


* Matthias Andree <matthias.andree at gmx.de> wrote:
> Stefan Mashkevich <mash at mashke.org> writes:
> > Same here. I also see 'sp<!ham>am' -- didn't even know one could write
> > HTML comments that way -- but that does already tokenize properly
> > ('spam').
> 
> What makes people accept HTML mail today, given all the dangers that
> bogofilter will not detect?

I'm not sure if this is of any interest but these are simple message
counts from my personal use of bogofilter during the six month period
from 1st May 2003 to 31st October 2003. I've kept up with the stable
bogofilter releases by recompiling the local bogofilter installation
whenever a new stable version has been released.

  Total number of legit messages received: 12987
    Legit messages classified as legit: 12987 (100.0%)
--> Legit messages classified as spam: 0 (0.0%) <--

  Total number of spam messages received: 13844
    Spam messages classified as spam: 13228 (95.6%)
    Spam messages classified as legit mail: 616 (4.4%)

My personal reason for prefering bogofilter over other systems (checks
for suspicious headers, DNS-based blacklists etc.) is the fact that
bayesian classification is basically the only way of achieving
reasonable figures in spam filtering while not throwing any legit
messages away. Throwing away all HTML mail would inevitably throw away
some legit mail, maybe not much, but I still wouldn't like to go back to
the methods I used before bogofilter (that threw away lots of spam but
occasionally legit mail as well).

Bogofilter is just so much better than simple checks for content-type
that I hope the development in HTML parsing techniques will carry on
even when it might occasionally look like a mission impossible.




More information about the Bogofilter mailing list