Templates [was: Prediction ...]
tanderso at oac-design.com
Fri Jul 2 09:49:54 EDT 2004
From: "Tom Allison" <tallison at tacocat.net>
> What I've seen is that >>99% of my spam is from ip addresses that only
> send one message over 4 months time. So there is little net improvement
> on detecting spam since almost every IP address will be dominated on the
> bases of robx/robs settings. So there is not much effect on detecting
This discussion is not about detecting IPs for filtering, but for outputting
IPs in the logs so that people can use them for a blacklist. Bogofilter
already uses IPs for filtering. No big deal there, since it's just another
token taken together with the rest of them in the message. But if you're
going to single out a token and say, "this is definitely the IP of the
connecting mail server, use it in your blacklist," then it's much more
important to be absolutely certain that it really is. The fact that there
is uncertainty and the difficulty in obtaining any measure of certainty is
the heart of this discussion. I've suggested not adding this functionality
because of this.
Also, I'd imagine this would be more important for a multi-user setup, where
the first spam registered can block the same to the rest of the users, not
so much future spams to the same user.
> And all this ASN/header_stripping that spamitarium managed to do didn't
> have much net effect on 30,000 emails that I studied. I posted all of
> these results on the mailing list months ago with little response.
I'm still wary about the results you've provided. It didn't seem to show
any affect on scoring at all, even though tests on single emails show a
large affect. But, I've yet to run my own structured experiment to show
otherwise, just the experience from using it on my own email where it has
been very successful. The one thing that your experiment did show, however,
was that the wordlist size was about 15% smaller using spamitarium with -s,
even if your scoring didn't change. In any event, that's not really what
we're talking about in this thread.
> Personally, I'm a little leary of all the new features that are being
> pushed into bogofilter. We still don't have a good understanding of ESF
I agree. Perfecting what exists is more important than adding new stuff.
The focus should be on improving scoring, streamlining code, speeding up
execution, reducing wordlist size, improving documentation, etc. Ancillary
stuff like parsing particular pieces of information for external use should
be done externally. Bloat will kill any program.
More information about the Bogofilter