Parsing of certain MIME messages, e.g. Vonage
Matt Garretson
mattg at assembly.state.ny.us
Wed Oct 14 21:36:21 CEST 2009
Greetings, all. Over the years, I've noticed that bogofilter
sometimes seems to mis-parse messages with MIME attachments.
Usually, it correctly skips over non-text or non-html
attachments, but sometimes it ends up tokenizing the encoded
strings of binary attachements. This usually leads to a score
of .5 due to dozens/hundreds/thousands of brand-new tokens.
I've not figured out what characteristics trigger this issue,
but one consistent offender is voicemail messages from Vonage.
A simplified example is at:
http://pastebin.com/m13e6623
At first glance, the boundary string seems odd, though I'm
not sure if that's the root of the problem. My bogolexer
output showing the errant tokens is here:
http://pastebin.com/m3fb9a0bd
Any thoughts? My Bogofilter version is 1.2.1 built from source
on Fedora 11.
Thanks,
-Matt
More information about the Bogofilter
mailing list