Database Size versus Shannon's Word Entropy

Rick van Rein rick at openfortress.nl
Wed Oct 25 22:58:41 CEST 2017


Hey Matthias,

Thanks again.

> Do you mean the regular exponential decay of the f(t) = a0 *
> exp(-\lambda *t) kind?

Yep.

> Bogofilter isn't designed to do that. It does that three-state thing,
> spam/ham/dunno,

Thanks for helping to clear that up.

Throughout this discussion, I've also come to the conclusion that it's
like sitting in a split.  Bogofilter wants to know quite a lot more
words than end users do (including header stuff) and the thing I have in
mind has different concerns, and is open to more guesswork.  So if I'm
going to design anything along the lines of splitting message streams
over aliases I'd best do it in a 2nd stage.  Similar algorithms may be
useful, but differently tuned.


Thanks for helping to clear that up.  Bogofilter is a really awesome
piece of code, and it's great for what it was meant to do, but not for
the patterns I had in mind :)


Cheers,
 -Rick


More information about the bogofilter mailing list