Quoted-Printable [was: Problems with Asian Spam]

David Relson relson at osagesoftware.com
Thu Nov 23 14:47:27 CET 2006


On Thu, 23 Nov 2006 08:30:02 +0000 (UTC)
stefan wrote:

> Just comming back to Asian spam, i searched in the database file for
> entries, that I have in these mails. I did not found them (because I
> do not know, what is really saved). I found that mails with a body
> like this are not detected correctly as spam:
> --
> Content-Type: text/html;
> Content-Transfer-Encoding: quoted-printable
> 
>   
>       =B9=AB=C1=F8-19
>   
>   
> 
> 
> =C0=CE=C5=CD=B3=DD =B1=DD=C0=B6 
> =C0=FC=B9=AE=BD=CE=C0=CC=C6=AE=
> 
>  
> --
> This is only part of the whole mail and will be interpreted by the
> mail program as nice colored Asian text with correct HTML code. 
> I searched for "=B9=AB=C1=F8-19" and "=B9" in the database, but I was
> not successful. Any idea what I am doing wrong?
> 
> Thank you
> Stefan.

Hi Stefan,

Nothing wrong.  Quoted printable is an encoding of the actual text so
that 8 bit characters, i.e. characters with hex values from 0x80 to
0xFF, can be transmitted using 7 bit characters.  Bogofilter
decodes the mime section, parses it, and uses the resulting tokens
for scoring.  The bogolexer can be used to see the results of the
parsing.  

As an example, I've taken your qp data and put it into a file named
msg.qp.txt which is attached.  Run command "bogolexer -p -I msg.qp.txt"
to see how the message is parsed.  Running bogofilter with "-vv" or
"-vvv" will show scoring info, i.e.
    bogofilter -vv -I msg.qp.txt
    bogofilter -vvv -I msg.qp.txt

HTH,

David
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: msg.qp.txt
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20061123/1fc1c2af/attachment.txt>


More information about the Bogofilter mailing list