article on blocking by subnets

David Relson relson at osagesoftware.com
Tue Dec 3 06:18:28 CET 2002


Gram,

I've been thinking about this and have some code.  It adds a "url:" prefix 
to all urls.  When get_token() encounters an IPADDR, it adds the prefix and 
returns all the full address (all 4 octets).  The next call returns the url 
(less the final octet).  The next call trims a second octet.  The final 
(fourth) call returns only the first octet.  After that, get_token() 
resumes normal operation.

The new code is in cvs.  Once you get it, in lexer.l change "#undef 
URL_TOKENS" to "#define URL_TOKENS" to enable the new capability.

Also recently added to cvs are contrib/randomtrain and 
contrib/README.randomtrain.  They may be useful in testing the new code to 
see if it helps identifying spam (or not).  randomtrain is a script from 
Greg Louis that builds word lists from mistakes.  See README.randomtrain 
for more info on the subject.  It's a pretty interesting idea.

Cheers!

David

At 10:03 PM 12/2/02, Graham Wilson wrote:

>On Mon, Dec 02, 2002 at 08:43:12PM -0500, David Relson wrote:
> > At 08:26 PM 12/2/02, Barry Gould wrote:
> > >if (token is an IP address (in form a.b.c.d) )
> > >{
> > >        create a new token for each of:
> > >        class C net (a.b.c.0)
> > >        class B net (a.b.0.0)
> > >        class A net (a.0.0.0) (dunno if this is a good idea or not)
> > >
> > >        and Evaluate or Store/Update them as appropriate, in addition to
> > >the original IP
> > >}
>[...]
> > Actually, I took a look at lexer.l.  It already recognizes URL's and the
> > distribution code has the beginnings of code for returning multiple tokens
> > (or subtokens) from 1 call to the lexer.  Apparently I thought of doing
> > something with URL's at on time, because my private copy of bogofilter has
> > some relevant code.
> >
> > If we can figure out what is wanted, I can code it.  However, I'm very
> > willing to leave the testing to others.  Do I have any volunteers?
>
>if you added the code to cvs or sent me a patch against cvs, id be
>willing to run some tests.
>
>--
>gram





More information about the Bogofilter mailing list