question about new spam encoding
Matt Garretson
mattg at assembly.state.ny.us
Wed Nov 19 23:15:23 CET 2003
Trevor Harrison wrote:
> I just ran into a spam encoding that I haven't seen before. In a
> text/html message, instead of "text", they put text
>
> Running thru bogolexer, all I'm seeing is the header tokens and some
> nbsp's, but no {'s.
Since the body part is text/html, bogofilter should be decoding all
the { stuff into text, and tokenizing based on that. And for
me, with version 0.15.8, that is what is happening. And in fact, it
gets detected as spam (spamicity=0.948258)
A sample of body
tokens found:
get_token: 1 "Refinance"
get_token: 1 "today"
get_token: 1 "low"
get_token: 1 "Save"
get_token: 1 "thousands"
get_token: 1 "nbsp"
get_token: 1 "dollars"
...etc...
However, i did notice two things unexpected. There's an http URL in
the body, http://www.quick-home-loan-search.biz/, which does not
get tokenized.
Also, an IP address from the header (200.59.68.139, in a Received: line),
which doesn't get tagged with rcvd: or head:
get_token: 5 "200.59.68.139"
Is that to be expected?
-Matt
More information about the Bogofilter
mailing list