question about new spam encoding
David Relson
relson at osagesoftware.com
Thu Nov 20 01:13:22 CET 2003
On Wed, 19 Nov 2003 17:47:43 -0500
Matt Garretson <mattg at assembly.state.ny.us> wrote:
> Matt Garretson wrote:
> > However, i did notice two things unexpected. There's an http URL in
> > the body, http://www.quick-home-loan-search.biz/, which does not
> > get tokenized.
> >
> > Also, an IP address from the header (200.59.68.139, in a Received:
> > line), which doesn't get tagged with rcvd: or head:
Matt,
URLs get special treatment. The "block_on_subnet" config file option
causes 200.59.68.139 to become 4 tokens (along the lines of the class A,
B, C, and D subnets, i.e. "url:200.59.68.139", "url:200.59.68",
"url:200.59", and "url:200". Evidently the special URL checks cause the
header tagging to be skipped. Before I jump in and make changes, we
should decide what to do when both options (header_line_markup and
block_on_subnets) are enabled.
> get_token: 1 "rcvd:from"
> get_token: 5 "200.59.68.139"
> get_token: 1 "rcvd:steastwood.harrison.org"
> get_token: 2 "head:Message-ID"
BTW, the numeric value corresponds to the token type (as returned by the
lexer).
> I'd expect www.quick-home-loan-search.biz to show up somewhere in
> there.
Tokens are limited to 30 chars, so long URLs are excluded :-(
More information about the Bogofilter
mailing list