Parsing of certain MIME messages, e.g. Vonage

Matt Garretson mattg at assembly.state.ny.us
Wed Oct 14 21:36:21 CEST 2009


Greetings, all. Over the years, I've noticed that bogofilter
sometimes seems to mis-parse messages with MIME attachments.
Usually, it correctly skips over non-text or non-html 
attachments, but sometimes it ends up tokenizing the encoded 
strings of binary attachements. This usually leads to a score
of .5 due to dozens/hundreds/thousands of brand-new tokens.

I've not figured out what characteristics trigger this issue,
but one consistent offender is voicemail messages from Vonage.
A simplified example is at:

  http://pastebin.com/m13e6623

At first glance, the boundary string seems odd, though I'm
not sure if that's the root of the problem. My bogolexer
output showing the errant tokens is here:

  http://pastebin.com/m3fb9a0bd

Any thoughts? My Bogofilter version is 1.2.1 built from source 
on Fedora 11.

Thanks,
-Matt



More information about the Bogofilter mailing list