article on blocking by subnets
David Relson
relson at osagesoftware.com
Tue Dec 3 06:18:28 CET 2002
Gram,
I've been thinking about this and have some code. It adds a "url:" prefix
to all urls. When get_token() encounters an IPADDR, it adds the prefix and
returns all the full address (all 4 octets). The next call returns the url
(less the final octet). The next call trims a second octet. The final
(fourth) call returns only the first octet. After that, get_token()
resumes normal operation.
The new code is in cvs. Once you get it, in lexer.l change "#undef
URL_TOKENS" to "#define URL_TOKENS" to enable the new capability.
Also recently added to cvs are contrib/randomtrain and
contrib/README.randomtrain. They may be useful in testing the new code to
see if it helps identifying spam (or not). randomtrain is a script from
Greg Louis that builds word lists from mistakes. See README.randomtrain
for more info on the subject. It's a pretty interesting idea.
Cheers!
David
At 10:03 PM 12/2/02, Graham Wilson wrote:
>On Mon, Dec 02, 2002 at 08:43:12PM -0500, David Relson wrote:
> > At 08:26 PM 12/2/02, Barry Gould wrote:
> > >if (token is an IP address (in form a.b.c.d) )
> > >{
> > > create a new token for each of:
> > > class C net (a.b.c.0)
> > > class B net (a.b.0.0)
> > > class A net (a.0.0.0) (dunno if this is a good idea or not)
> > >
> > > and Evaluate or Store/Update them as appropriate, in addition to
> > >the original IP
> > >}
>[...]
> > Actually, I took a look at lexer.l. It already recognizes URL's and the
> > distribution code has the beginnings of code for returning multiple tokens
> > (or subtokens) from 1 call to the lexer. Apparently I thought of doing
> > something with URL's at on time, because my private copy of bogofilter has
> > some relevant code.
> >
> > If we can figure out what is wanted, I can code it. However, I'm very
> > willing to leave the testing to others. Do I have any volunteers?
>
>if you added the code to cvs or sent me a patch against cvs, id be
>willing to run some tests.
>
>--
>gram
More information about the Bogofilter
mailing list