RFC-2047 [was: New spam trick]
David Relson
relson at osagesoftware.com
Mon Jul 21 16:21:03 CEST 2003
At 09:59 AM 7/21/03, Boris 'pi' Piwinger wrote:
>David Relson wrote:
> >
> > Looks like bogofilter doesn't know about RFC-2047. I guess it's time to
> > add RFC-2047 compliance to the TODO list.
>
>That was important all the time. But note that my remark
>refers to something which is not compliant to RfC 2047,
>actually, none of the above is. It should also work. This
>should greatly enhance detecting spam. Without this tagging
>subjects is almost useles.
>
>There is one question though: Decoding remove the charset
>info (as long as we have not implemented Unicode). So it
>might be a good idea to also add the charset to the list
>(which will catch all that asian spam).
>
>pi
At present, those lines will generate tokens subj:iso-8859-1 and
subj:ISO-8859-1 (and others). Try running command "bogoutil -w
YOUR_BOGODIR subj:iso-8859-1 subj:ISO-8859-1" to see what bogofilter thinks
of those tokens.
My wordlist shows:
spam good
subj:iso-8859-1 117 12
subj:ISO-8859-1 5 6
which makes the first clearly spam and the second indeterminate. So, even
though bogofilter doesn't know about RFC-2047, useful information is found
in the headers. Of course, there is room for improvement ...
More information about the Bogofilter
mailing list