question about new spam encoding
David Relson
relson at osagesoftware.com
Wed Nov 19 23:02:32 CET 2003
On Wed, 19 Nov 2003 16:38:12 -0500
Trevor Harrison <trevor-bogofilter at harrison.org> wrote:
> I just ran into a spam encoding that I haven't seen before. In a
> text/html message, instead of "text", they put
> text
>
> Running thru bogolexer, all I'm seeing is the header tokens and some
> nbsp's, but no {'s. I'm guessing they are considered individual
> tokens and are too short or something.
>
> The message is here: http://www.harrison.org/~trevor/spam1.txt
>
>
> -Trevor
>
Trevor,
On 2003-10-06 the decoding of escaped html characters was added to
bogofilter. It's in 0.15.7 and 0.15.8.
With a current version of bogofilter, it decodes correctly as "text".
If you have an older version, they're ignored because a number is all
that's left after removing special characters and bogofilter doesn't
convert numbers to tokens.
Take the attached file, msg.html.1119.txt, and run command "bogolexer -p
< msg.html.1119.txt" and you should see "text" in the output.
David
More information about the Bogofilter
mailing list