[bogofilter] using block_on_subnets
Tom Anderson
tanderso at oac-design.com
Fri Apr 30 14:16:44 CEST 2004
On Fri, 2004-04-30 at 06:33, Tom Allison wrote:
> Just for grins, I rebuilt my wordlist using this subnets option
> enabled.
> My current database is about 50% larger than it was previously.
> It will be interesting to see how it grows in the coming week.
Using http://www.orderamidchaos.com/bogofilter/spamitarium, I've been
inserting Autonomous System Numbers (ASNs) in all of my received lines,
as this achieves the same basic goal as block_on_subnets, without the
huge wordlist bloat.
> I'm pretty certain that these are all invalid URL's.
> I just surprised at how many of them are also "good"
It's also possible that 0, 0.0, etc., were used in a different context
than a URL, but bogofilter parsed it wrong. It's also possible that
some MTAs might print 0.0.0.0 when they fail to obtain a lookup for the
correct IP. My program throws away received lines without a valid IP,
particularly IPs in reserved ranges, impossible IPs, and local IPs.
> If it seems reasonable enough to start looking into more feasable
> studies, then it might make sense to collect bogofilter wordlist
> information from other peoples ^url: listings to see if there is
> sufficient and consistent overlap to provide a reliable means of
> detection.
As I said before, I'm not using the block_on_subnets option due to the
disk space considerations, but ASNs can achieve the same goal. Here are
some of my top scorers (>25 seen):
spam good Fisher
rcvd:as4294967295 691 0 0.999988
rcvd:as6478 408 0 0.999979
rcvd:as7132 242 1 0.873450
rcvd:as3561 224 11 0.367469
rcvd:as22909 166 23 0.170772
rcvd:as4134 120 0 0.999929
rcvd:as3356 116 4 0.452753
rcvd:as13749 12 106 0.003297
rcvd:as852 97 0 0.999912
rcvd:as17676 88 1 0.715094
rcvd:as6128 84 0 0.999898
rcvd:as701 10 70 0.004174
rcvd:as30092 79 0 0.999892
rcvd:as3549 74 0 0.999885
rcvd:as4766 67 0 0.999873
rcvd:as7018 64 3 0.378373
rcvd:as20115 64 0 0.999867
rcvd:as8151 64 0 0.999867
rcvd:as7015 60 0 0.999858
rcvd:as9304 60 0 0.999858
rcvd:as7738 58 0 0.999853
rcvd:as4812 55 1 0.610716
rcvd:as9318 55 0 0.999845
rcvd:as4837 54 0 0.999842
rcvd:as6327 54 0 0.999842
rcvd:as30033 52 0 0.999836
rcvd:as27699 45 0 0.999810
rcvd:as13571 43 0 0.999801
rcvd:as7843 42 0 0.999797
rcvd:as9277 40 0 0.999786
rcvd:as9381 40 0 0.999786
rcvd:as2828 36 1 0.506663
rcvd:as27382 0 34 0.000272
rcvd:as812 33 0 0.999741
rcvd:as11351 32 0 0.999733
rcvd:as3320 32 0 0.999733
rcvd:as3786 30 0 0.999715
rcvd:as11426 24 4 0.146387
rcvd:as12271 29 1 0.452783
rcvd:as209 27 3 0.204482
rcvd:as1785 1 28 0.001336
rcvd:as3215 28 0 0.999695
rcvd:as16966 1 27 0.001385
rcvd:as12322 27 0 0.999684
rcvd:as11388 26 0 0.999672
rcvd:as11427 26 0 0.999672
Even among the lesser seen ones (1-25 times), they are quite polarized
one way or the other. I probably have less than 200 total ASNs in my
wordlist, which contributes very little to the overall size.
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040430/62a15a10/attachment.sig>
More information about the Bogofilter
mailing list