Database Size versus Shannon's Word Entropy
Rick van Rein
rick at openfortress.nl
Tue Oct 24 22:59:20 CEST 2017
Hi,
> Bogoutil also allows the user to filter out seen-only-once tokens
> (lower-case -c option -- by age, see the -a option).
Thanks for pointing that out!
> Not sure if it
> really matters much: 44 MB seems small enough these days (it sure wasn't
> when I built my first Linux PC on DX4 basis in the late 1990s).
I started on a ZX Spectrum and have always thought 48 kB was a whole lot :)
But my reason for wondering about database size is that I am also
thinking about splitten them over users, such as a separate spam filter
for aliases like rick+bboy at example.com that cover an area of interest
for the mail user. Or IMAP subfolders.
Bogofilter is likely to be useful to sort email into the right alias
and/or IMAP sub-folder (including ones for Spam and Unsure). But that
would not allow for the light-weight alias [0] support that we're after
for our IdentityHub project [1]. That got me thinking / playing about
the database size.
I did notice that the number of cases is a #define now set to 2, so a
more practical approach to this alias sorting idea could be to simply
have larger records with counters for each alias. [But that would make
it a -dev discussion I suppose.]
Thanks,
-Rick
[0] http://internetwide.org/blog/2015/04/23/id-3-idforms.html
[1] http://internetwide.org/blog/2016/06/24/iwo-phases.html
More information about the bogofilter
mailing list