base64, quoted-printable pre-processor for bogofilter

Allyn Fratkin allyn at fratkin.com
Wed Oct 23 06:11:58 CEST 2002


attached is a preprocessor that i use to decode base64 and quoted-printable
text attachments before feeding to bogofilter.  note that i do not
use the -p feature to add the bogofitler header to the email.

THIS IS NOT TO BE USED WITH -p !!

this is a very quick-and-dirty script that does not really understand
mime syntax very well and is not intended to transform the message
in syntactically correct ways.  this is only useful as a dead-end
"nobody but bogofilter will be looking at the output so it doesn't
need to be correct" filter.  in fact, non-text base64 attachments are
completely removed from the output!

THIS IS NOT TO BE USED WITH -p !!

it uses a state machine to notice when a text attachment is approaching
and starts decoding until it thinks it finds the end of the attachment.
it can be easily fooled during quotable-printed attachments.

THIS IS NOT TO BE USED WITH -p !!

i am sending it because someone might find it useful.
-- 
Allyn Fratkin             allyn at fratkin.com
Escondido, CA             http://www.fratkin.com/
-------------- next part --------------
#!/usr/bin/perl

#
# unbase64
#
# allyn fratkin <allyn at fratkin.com>
#
# DO NOT USE AS A FILTER!!
# USE ONLY AS INPUT TO BOGOFILTER WITHOUT -p OPTION!
#
# decodes base64 and quoted-printable text attachments
# deletes binary base64 attachments
# other minor preprocessing for bogofilter
# 
# decoding routines were "lifted" from dmmh script found on internet
# dmmh author is Per Hedeland <per at erix.ericsson.se>
#

$inbase64 = 0;
$inqp = 0;
$base64approaching = 0;
$qpapproaching = 0;
$textattachment = 0;

while (<>) {
	s/\r$//;	# bogofilter can't deal with \r correctly
	if (/^$/) {
		$inqp = $qpapproaching;
		$inbase64 = $base64approaching;
	}
	if (/^--[^>]/ || /^From /) {
		$inbase64 = 0;
		$inqp = 0;
		$base64approaching = 0;
		$qpapproaching = 0;
		$textattachment = 0;
	}
	if ($textattachment) {
		if ($inbase64) {
			print db64($_);
			next;
		}
		if ($inqp) {
			print dqp($_);
			next;
		}
	}
	next if ($inbase64);
	if (/^Content-Type.*text\//i) {
		$textattachment++;
	}
	if (/^Content-Transfer-Encoding.*base64/i) {
		$base64approaching++;
	}
	if (/^Content-Transfer-Encoding.*quoted-printable/i) {
		$qpapproaching++;
	}
	$_ =~ s/\0//g;	# bogofilter can't deal with \0 correctly
	print;
}

print "\n";

sub dqp {
    my $res = shift;

    $res =~ s/=\n//ms;          # = at end of line continues on next line
    #$res =~ s/_/=20/g;          # code hex 20 may be encoded as '_'
    $res =~ s/=([\da-fA-F]{2})/pack("C", hex($1))/ge;
    $res =~ s/\0//g;	# bogofilter can't deal with \0 correctly
    $res;
}

sub db64 {
    local($^W) = 0; # unpack("u",...) gives bogus warning in 5.001m

    my $str = shift;
    my $res = "";
   
    $str =~ tr|A-Za-z0-9+/||cd;             # remove non-base64 chars (padding)
    $str =~ tr|A-Za-z0-9+/| -_|;            # convert to uuencoded format
    while ($str =~ /(.{1,60})/gs) {
        my $len = chr(32 + length($1)*3/4); # compute length byte
        $res .= unpack("u", $len . $1 );    # uudecode
    }
    $res =~ s/\0//g;	# bogofilter can't deal with \0 correctly
    $res;
}



More information about the bogofilter mailing list