[bogofilter] interesting paper

Tom Allison tallison at tacocat.net
Wed May 5 17:47:23 EDT 2004


Tom Anderson wrote:
> From: "Richard Kimber" <rkimber at ntlworld.com>
> 
>>http://crm114.sourceforge.net/Plateau_Paper.html
>>
>>entitled: The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How
>>to Get Past It.
> 
> 
> I agree with this idea that shared resources is useful, but not in a
> blacklisting sense.  It's odd that the author would "solve" the statistical
> plateau with a non-statistical approach.  Instead, I'd like to see the
> ability for bogofilter to accept several URLs (perhaps with passwords) in
> its configuration for corroborative or supplementary wordlists.  That is,
> look up all of the tokens in your own local wordlist first, but if you
> haven't seen a token before, then instead of using robx (unless you have no
> URLs specified or no connection), look up that particular token in your
> first supplementary wordlist.  If it isn't there, or you timeout, then move
> to the next one.  This way, you could share with your friends or colleagues,
> but in the order in which you think your email is most similar, with
> failover.  This would also allow an organization to maintain a global
> wordlist, perhaps from a spam trap, and keep local wordlists for each user
> that only represent the deviation from the global one, thus significantly
> reducing their size.
> 
> Tom
> 

I could imagine that sharing the IP/ASN information between different 
users might make a contribution that is more independent of one's 
personal interpretation of spam.  Recall that bogofilter is highly tuned 
to one persons email and context.  But it might be possible that IP/ASN 
information is more independent than other data.


More information about the Bogofilter mailing list