[bogofilter] using block_on_subnets

Tom Allison tallison at tacocat.net
Fri Apr 30 12:39:27 CEST 2004


David Relson wrote:
> On 29 Apr 2004 07:54:23 -0400
> Tom Anderson wrote:
> 
> 
>>On Thu, 2004-04-29 at 07:36, David Relson wrote:
>>
>>>"Any" covers a lot of territory:-)  Doing all the work in bogoutil
>>>would require reading the whole wordlist and applying the wildcard.
>>
>>Would it be difficult to insert a regular expression at the datastore
>>level?  I browsed through the code, but don't feel qualified to try to
>>patch something in myself.
>>
>>
>>>A bit more complex, but using the command line's capabilities, is to
>>>use"bogoutil -d | egrep | awk print $1 | bogoutil -p".
>>
>>Ok, at which point is the wildcard token declared here.  I'm guessing
>>as an argument to egrep, but I ended up with a broken pipe :(
>>
>>Tom
> 
> 
> Tom,
> 
> I don't know if BerkeleyDB has builtin support for wildcarding...  My
> guess would be that it doesn't, but I might be wrong.
> 
> If memory serves, the command sequence is:
> 
>   bogoutil -d $path/wordlist.db \
>     | egrep "expr" \
>     | awk '{print $1}' \
>     | bogoutil -p $path/wordlist.db
> 
> Enjoy!
> 
> David
> 

I was playing a bit with this today.
I'm amazed at how many URL entries are just invalid IP addresses.
url:0                              66     252  0.318316
url:0.0                             2      25  0.125084
url:0.0.0                           0      25  0.000370
url:0.0.0.0                         0      25  0.000370
url:0.0.160                         1       0  0.991605

I'm pretty certain that these are all invalid URL's.
I just surprised at how many of them are also "good"



More information about the Bogofilter mailing list