[bogofilter] Re: [bogofilter] Test sets, accuracy and other things

Matt Armstrong matt at lickey.com
Tue Sep 10 21:11:27 CEST 2002


Jonathan Buzzard <jonathan at buzzard.org.uk> writes:
> 
> The first thing I am going to say is that ignoring the contents of
> MIME attachments is a sure fire way to let spam get through. In tests
> if I take any plain text spam message from August and put it in a
> quoted printable plain text attachment it passes straight through
> bogofilter. Admittedly in the 2500 spams I have received I don't think
> any such spam exists, but if bogofilter ever becomes ubiquitous I am
> sure they will start appearing quickly. Therefore my first suggestion
> is that bogofilter does examine the contents of at least quoted
> printable MIME attachments.

I'm currently using "spamoracle" which implements essentially the same
algorithm as bogofilter except that it does decode quoted printable and
base64 attachments before processing them.  Spamoracle, purely
subjectively, seems to do a better job than bogofilter did when I was
using it (a few versions ago).

Another interesting thing spamoracle does is turn consecutive sequences
of various "suspect" characters into virtual words.  E.g. 5 consecutive
non-ascii characters becomes the word W5, and a 7 character word in all
upper case becomes U7.  The non-ascii virtual word capability in
particular catches all "korean" spam.

> Now if bogofilter returned a different exit code for emails that
> although not spam contained a MIME attachment I could restrict the
> scanning for virus to those 6.5% of emails that could possibly contain
> a virus.

In procmail you'd do this with:

:0
* ^Content-Type:.*multipart
{
    # do your scan for virus magic
}

With the advantage of keeping the interface to bogofilter simple.




More information about the Bogofilter mailing list