Encoded headers parser

Junior jxz at uol.com.br
Thu Jul 24 21:06:58 CEST 2003


Hello! I'm using bogofilter 0.14.0.1.

echo 'Subject: =?iso-8859-1?Q?t=E9ste!?=' | bogolexer -p
subj:iso-8859-1
subj:E9ste!

I dont' know if they are standard compliant, but several ham and spam are
having headers encoded text parsed incorrectly. The example above I did
in mutt, and I guess that is compliant.

I added '!?.' to the list of QP valid chars, and thinks seems to be
working here. The errors were found where this chars were present in the
quoted-printable encoded headers.

B64_OR_QP   [0-9a-zA-Z\-\+\/\=_:!\?\.]+

And now:

echo 'Subject: =?iso-8859-1?Q?t=E9ste!?=' | bogolexer -p
subj:téste!

Another question: will bogofilter ever support double tokens store
(phrases)? It would improve accuracy, but you don't do it because the
performance issue?

Thanks!

-- 
Junior
jxz at uol.com.br 
http://jxz.dontexist.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030724/9a2666f5/attachment.sig>


More information about the Bogofilter mailing list