base64, quoted-printable pre-processor for bogofilter
Allyn Fratkin
allyn at fratkin.com
Wed Oct 23 06:11:58 CEST 2002
attached is a preprocessor that i use to decode base64 and quoted-printable
text attachments before feeding to bogofilter. note that i do not
use the -p feature to add the bogofitler header to the email.
THIS IS NOT TO BE USED WITH -p !!
this is a very quick-and-dirty script that does not really understand
mime syntax very well and is not intended to transform the message
in syntactically correct ways. this is only useful as a dead-end
"nobody but bogofilter will be looking at the output so it doesn't
need to be correct" filter. in fact, non-text base64 attachments are
completely removed from the output!
THIS IS NOT TO BE USED WITH -p !!
it uses a state machine to notice when a text attachment is approaching
and starts decoding until it thinks it finds the end of the attachment.
it can be easily fooled during quotable-printed attachments.
THIS IS NOT TO BE USED WITH -p !!
i am sending it because someone might find it useful.
--
Allyn Fratkin allyn at fratkin.com
Escondido, CA http://www.fratkin.com/
-------------- next part --------------
#!/usr/bin/perl
#
# unbase64
#
# allyn fratkin <allyn at fratkin.com>
#
# DO NOT USE AS A FILTER!!
# USE ONLY AS INPUT TO BOGOFILTER WITHOUT -p OPTION!
#
# decodes base64 and quoted-printable text attachments
# deletes binary base64 attachments
# other minor preprocessing for bogofilter
#
# decoding routines were "lifted" from dmmh script found on internet
# dmmh author is Per Hedeland <per at erix.ericsson.se>
#
$inbase64 = 0;
$inqp = 0;
$base64approaching = 0;
$qpapproaching = 0;
$textattachment = 0;
while (<>) {
s/\r$//; # bogofilter can't deal with \r correctly
if (/^$/) {
$inqp = $qpapproaching;
$inbase64 = $base64approaching;
}
if (/^--[^>]/ || /^From /) {
$inbase64 = 0;
$inqp = 0;
$base64approaching = 0;
$qpapproaching = 0;
$textattachment = 0;
}
if ($textattachment) {
if ($inbase64) {
print db64($_);
next;
}
if ($inqp) {
print dqp($_);
next;
}
}
next if ($inbase64);
if (/^Content-Type.*text\//i) {
$textattachment++;
}
if (/^Content-Transfer-Encoding.*base64/i) {
$base64approaching++;
}
if (/^Content-Transfer-Encoding.*quoted-printable/i) {
$qpapproaching++;
}
$_ =~ s/\0//g; # bogofilter can't deal with \0 correctly
print;
}
print "\n";
sub dqp {
my $res = shift;
$res =~ s/=\n//ms; # = at end of line continues on next line
#$res =~ s/_/=20/g; # code hex 20 may be encoded as '_'
$res =~ s/=([\da-fA-F]{2})/pack("C", hex($1))/ge;
$res =~ s/\0//g; # bogofilter can't deal with \0 correctly
$res;
}
sub db64 {
local($^W) = 0; # unpack("u",...) gives bogus warning in 5.001m
my $str = shift;
my $res = "";
$str =~ tr|A-Za-z0-9+/||cd; # remove non-base64 chars (padding)
$str =~ tr|A-Za-z0-9+/| -_|; # convert to uuencoded format
while ($str =~ /(.{1,60})/gs) {
my $len = chr(32 + length($1)*3/4); # compute length byte
$res .= unpack("u", $len . $1 ); # uudecode
}
$res =~ s/\0//g; # bogofilter can't deal with \0 correctly
$res;
}
More information about the bogofilter
mailing list