RFC-2047 [was: New spam trick]

David Relson relson at osagesoftware.com
Mon Jul 21 16:21:03 CEST 2003


At 09:59 AM 7/21/03, Boris 'pi' Piwinger wrote:
>David Relson wrote:
> >
> > Looks like bogofilter doesn't know about RFC-2047.  I guess it's time to
> > add RFC-2047 compliance to the TODO list.
>
>That was important all the time. But note that my remark
>refers to something which is not compliant to RfC 2047,
>actually, none of the above is. It should also work. This
>should greatly enhance detecting spam. Without this tagging
>subjects is almost useles.
>
>There is one question though: Decoding remove the charset
>info (as long as we have not implemented Unicode). So it
>might be a good idea to also add the charset to the list
>(which will catch all that asian spam).
>
>pi

At present, those lines will generate tokens subj:iso-8859-1 and 
subj:ISO-8859-1 (and others).  Try running command "bogoutil -w 
YOUR_BOGODIR subj:iso-8859-1 subj:ISO-8859-1" to see what bogofilter thinks 
of those tokens.

My wordlist shows:
                        spam   good
subj:iso-8859-1         117     12
subj:ISO-8859-1           5      6

which makes the first clearly spam and the second indeterminate.  So, even 
though bogofilter doesn't know about RFC-2047, useful information is found 
in the headers.  Of course, there is room for improvement ...







More information about the Bogofilter mailing list