[bogofilter] using block_on_subnets

Tom Anderson tanderso at oac-design.com
Fri Apr 30 14:16:44 CEST 2004


On Fri, 2004-04-30 at 06:33, Tom Allison wrote:
> Just for grins, I rebuilt my wordlist using this subnets option
> enabled.
> My current database is about 50% larger than it was previously.
> It will be interesting to see how it grows in the coming week.

Using http://www.orderamidchaos.com/bogofilter/spamitarium, I've been
inserting Autonomous System Numbers (ASNs) in all of my received lines,
as this achieves the same basic goal as block_on_subnets, without the
huge wordlist bloat.

> I'm pretty certain that these are all invalid URL's.
> I just surprised at how many of them are also "good"

It's also possible that 0, 0.0, etc., were used in a different context
than a URL, but bogofilter parsed it wrong.  It's also possible that
some MTAs might print 0.0.0.0 when they fail to obtain a lookup for the
correct IP.  My program throws away received lines without a valid IP,
particularly IPs in reserved ranges, impossible IPs, and local IPs.

> If it seems reasonable enough to start looking into more feasable 
> studies, then it might make sense to collect bogofilter wordlist 
> information from other peoples ^url: listings to see if there is 
> sufficient and consistent overlap to provide a reliable means of
> detection.

As I said before, I'm not using the block_on_subnets option due to the
disk space considerations, but ASNs can achieve the same goal.  Here are
some of my top scorers (>25 seen):

                                 spam    good    Fisher

rcvd:as4294967295                 691       0  0.999988
rcvd:as6478                       408       0  0.999979
rcvd:as7132                       242       1  0.873450
rcvd:as3561                       224      11  0.367469
rcvd:as22909                      166      23  0.170772
rcvd:as4134                       120       0  0.999929
rcvd:as3356                       116       4  0.452753
rcvd:as13749                       12     106  0.003297
rcvd:as852                         97       0  0.999912
rcvd:as17676                       88       1  0.715094
rcvd:as6128                        84       0  0.999898
rcvd:as701                         10      70  0.004174
rcvd:as30092                       79       0  0.999892
rcvd:as3549                        74       0  0.999885
rcvd:as4766                        67       0  0.999873
rcvd:as7018                        64       3  0.378373
rcvd:as20115                       64       0  0.999867
rcvd:as8151                        64       0  0.999867
rcvd:as7015                        60       0  0.999858
rcvd:as9304                        60       0  0.999858
rcvd:as7738                        58       0  0.999853
rcvd:as4812                        55       1  0.610716
rcvd:as9318                        55       0  0.999845
rcvd:as4837                        54       0  0.999842
rcvd:as6327                        54       0  0.999842
rcvd:as30033                       52       0  0.999836
rcvd:as27699                       45       0  0.999810
rcvd:as13571                       43       0  0.999801
rcvd:as7843                        42       0  0.999797
rcvd:as9277                        40       0  0.999786
rcvd:as9381                        40       0  0.999786
rcvd:as2828                        36       1  0.506663
rcvd:as27382                        0      34  0.000272
rcvd:as812                         33       0  0.999741
rcvd:as11351                       32       0  0.999733
rcvd:as3320                        32       0  0.999733
rcvd:as3786                        30       0  0.999715
rcvd:as11426                       24       4  0.146387
rcvd:as12271                       29       1  0.452783
rcvd:as209                         27       3  0.204482
rcvd:as1785                         1      28  0.001336
rcvd:as3215                        28       0  0.999695
rcvd:as16966                        1      27  0.001385
rcvd:as12322                       27       0  0.999684
rcvd:as11388                       26       0  0.999672
rcvd:as11427                       26       0  0.999672

Even among the lesser seen ones (1-25 times), they are quite polarized
one way or the other.  I probably have less than 200 total ASNs in my
wordlist, which contributes very little to the overall size.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040430/62a15a10/attachment.sig>


More information about the Bogofilter mailing list