question about new spam encoding

David Relson relson at osagesoftware.com
Thu Nov 20 01:13:22 CET 2003


On Wed, 19 Nov 2003 17:47:43 -0500
Matt Garretson <mattg at assembly.state.ny.us> wrote:

> Matt Garretson wrote:
> > However, i did notice two things unexpected.  There's an http URL in
> > the body, http://www.quick-home-loan-search.biz/, which does not
> > get tokenized.
> > 
> > Also, an IP address from the header (200.59.68.139, in a Received:
> > line), which doesn't get tagged with rcvd: or head:

Matt,

URLs get special treatment.  The "block_on_subnet" config file option
causes 200.59.68.139 to become 4 tokens (along the lines of the class A,
B, C, and D subnets, i.e. "url:200.59.68.139", "url:200.59.68",
"url:200.59", and "url:200".  Evidently the special URL checks cause the
header tagging to be skipped.  Before I jump in and make changes, we
should decide what to do when both options (header_line_markup and
block_on_subnets) are enabled.

> get_token: 1 "rcvd:from"
> get_token: 5 "200.59.68.139"
> get_token: 1 "rcvd:steastwood.harrison.org"
> get_token: 2 "head:Message-ID"

BTW, the numeric value corresponds to the token type (as returned by the
lexer).

> I'd expect www.quick-home-loan-search.biz to show up somewhere in
> there.

Tokens are limited to 30 chars, so long URLs are excluded :-(




More information about the Bogofilter mailing list