Database Size versus Shannon's Word Entropy

RW rwmaillists at googlemail.com
Wed Oct 25 15:59:19 CEST 2017


On Tue, 24 Oct 2017 23:14:18 +0200
Matthias Andree wrote:

> Am 24.10.2017 um 22:59 schrieb Rick van Rein:
> 
> > But my reason for wondering about database size is that I am also
> > thinking about splitten them over users, such as a separate spam
> > filter for aliases like rick+bboy at example.com that cover an area of
> > interest for the mail user.  Or IMAP subfolders.  
> 
> The thing is, spam doesn't care too much about the recipient, of
> course somewhat targetted spam through lists might, but on the whole
> I think it's more hinging on the scale effects of dirt cheap sending
> millions of messages more than anything else.

The point of doing that is not to tailor the database to specific spam
but to restrict the vocabulary of the ham, which can make it much
easier to get a spam classification. 

  



More information about the bogofilter mailing list