decoding implementation

Gyepi SAM gyepi at praxis-sw.com
Mon Nov 25 07:20:28 CET 2002


On Sun, Nov 24, 2002 at 10:48:04PM +0100, Matthias Andree wrote:
> On Sat, 23 Nov 2002, Gyepi SAM wrote:
> 
> > I am looking for suggestions on adding 
> > base64 and quoted-printable decoding to bogofilter.
> > 
> > There are two issues I'd like to discuss:
> > 
> > 1. Data representation. Should we modify the
> > [Content-Transfer-Encoding] headers of a message after it has been
> > decoded (for consistency and truthfulness) or should we leave the
> > headers alone (preserve information)
> 
> I have thought about that as well. Basically, we can offer three -p
> modes: original, decoded and canonical. original would be whatever the
> original mail was (yes I know this is a problem currently on servers
> with low RAM that route big emails); decoded would be 8bit, original
> character set; and canonical would be 8bit/utf-8 or something.

Sorry, the question was really about what bogofilter sees, not
how we deliver to the end user. So when we decode a base64 encoded email,
do we leave teh Content-Transfer-Encoding header in there for bogofilter to
pick up or remove it since the data is no longer encoded. No matter what we
do, we always send the original email to the user (assuming the -p option is in effect)

I can understand your arguments against the pipe() + fork() combo.
You're right. Doing it well requires a lot of code that is not
particularly focused on mail filtering.

> > case d: more elegant but harder.
> I'm not sure I understood that suggestion. "coroutines"? Could you
> define this suggestion?

Coroutines are peer funtions that call each other.
Instead of the standard model where routine a calls routine b,
b returns, and a continues after the call, coroutines yield control
to each other when necessary, when they return, pick up the execution
where they left off.  It works particularly well for consumer/producer
problems of the type we are discussing.

Knuth discusses them in some detail. Edgar Toernig has an implementation
at http://www.goron.de/~froese/, but it is x86 specific.

http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html also contains
discussion of coroutines.

Like I said, it would be difficult, especially if we're going to portability.

> If and only if we know by our options we might need to rewind AND stdin
> is not a regular file, then copy it to the temp file as we're reading
> (like tee(1) would do), and on the second go, read from the temp file.

This is a good idea.

-Gyepi



More information about the bogofilter-dev mailing list