[bogofilter] interesting paper
Tom Allison
tallison at tacocat.net
Wed May 5 17:47:23 EDT 2004
Tom Anderson wrote:
> From: "Richard Kimber" <rkimber at ntlworld.com>
>
>>http://crm114.sourceforge.net/Plateau_Paper.html
>>
>>entitled: The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How
>>to Get Past It.
>
>
> I agree with this idea that shared resources is useful, but not in a
> blacklisting sense. It's odd that the author would "solve" the statistical
> plateau with a non-statistical approach. Instead, I'd like to see the
> ability for bogofilter to accept several URLs (perhaps with passwords) in
> its configuration for corroborative or supplementary wordlists. That is,
> look up all of the tokens in your own local wordlist first, but if you
> haven't seen a token before, then instead of using robx (unless you have no
> URLs specified or no connection), look up that particular token in your
> first supplementary wordlist. If it isn't there, or you timeout, then move
> to the next one. This way, you could share with your friends or colleagues,
> but in the order in which you think your email is most similar, with
> failover. This would also allow an organization to maintain a global
> wordlist, perhaps from a spam trap, and keep local wordlists for each user
> that only represent the deviation from the global one, thus significantly
> reducing their size.
>
> Tom
>
I could imagine that sharing the IP/ASN information between different
users might make a contribution that is more independent of one's
personal interpretation of spam. Recall that bogofilter is highly tuned
to one persons email and context. But it might be possible that IP/ASN
information is more independent than other data.
More information about the Bogofilter
mailing list