Encoded tôkens of subjéct with làrge têxt get splitted fróm MUA

Junior jxz at uol.com.br
Fri Aug 1 02:20:32 CEST 2003


Hello. This subject confuses the lexer.

	Subject: Encoded =?iso-8859-1?Q?t=F4kens_of_subj=E9?=
			=?iso-8859-1?Q?ct_with_l=E0rge_t=EAxt_get_spllited_fr=F3m?= MUA

Output from bogolexer:

	subj:Encoded
	subj:tôkens
	subj:subj
	with
	làrge
	têxt
	get
	spllited
	fróm
	MUA

I'm using bogofilter version 0.14.0.1.cvs.20030730.		
		
The words are splitted in the middle. I think this is RFC compliant, and
I generally receive many emails with headers like this. The lexer should
ignore that spaces between the two encoded parts.

More examples:

	Subject: =?iso-8859-1?Q?COMO_ESTUDAR_-_1000_Dicas_de_T=E9cnicas_e_Organiza=E7=E3o_?=
			=?iso-8859-1?Q?do_Estudo?=

This doesn't break a word in the middle.
		
	Subject: =?iso-8859-1?Q?Re:_RES:_=5Bfoo=5D_Por_que_o_.NET_=E9_melhor_que_o_Ja?=
			=?iso-8859-1?Q?va?=

This break "Java". 

	Subject: =?iso-8859-1?Q?O_ESTADO_DE_S=C3O_PAULO_-_Guinada_do_Polo_Magn=E9tico_da_T?=
			=?iso-8859-1?Q?erra?=

This break "Terra".

Até mais, abraços.

-- 
Junior
j x z _em_ u o l  c o m  b r 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030731/71281ea4/attachment.sig>


More information about the Bogofilter mailing list