Encoded tôkens of subjéct with làrge têxt get splitted fróm MUA
Junior
jxz at uol.com.br
Fri Aug 1 02:20:32 CEST 2003
Hello. This subject confuses the lexer.
Subject: Encoded =?iso-8859-1?Q?t=F4kens_of_subj=E9?=
=?iso-8859-1?Q?ct_with_l=E0rge_t=EAxt_get_spllited_fr=F3m?= MUA
Output from bogolexer:
subj:Encoded
subj:tôkens
subj:subj
with
làrge
têxt
get
spllited
fróm
MUA
I'm using bogofilter version 0.14.0.1.cvs.20030730.
The words are splitted in the middle. I think this is RFC compliant, and
I generally receive many emails with headers like this. The lexer should
ignore that spaces between the two encoded parts.
More examples:
Subject: =?iso-8859-1?Q?COMO_ESTUDAR_-_1000_Dicas_de_T=E9cnicas_e_Organiza=E7=E3o_?=
=?iso-8859-1?Q?do_Estudo?=
This doesn't break a word in the middle.
Subject: =?iso-8859-1?Q?Re:_RES:_=5Bfoo=5D_Por_que_o_.NET_=E9_melhor_que_o_Ja?=
=?iso-8859-1?Q?va?=
This break "Java".
Subject: =?iso-8859-1?Q?O_ESTADO_DE_S=C3O_PAULO_-_Guinada_do_Polo_Magn=E9tico_da_T?=
=?iso-8859-1?Q?erra?=
This break "Terra".
Até mais, abraços.
--
Junior
j x z _em_ u o l c o m b r
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030731/71281ea4/attachment.sig>
More information about the Bogofilter
mailing list