New (?) idea to optimize database

Boris 'pi' Piwinger 3.14 at piology.org
Sat Mar 18 16:04:54 CET 2006


Hi!

We had lengthy discussions how to optimize (=minimize) the
database to get best performance. This is why I created
bogominitrain. Now clearly, this will also collect useless
tokens. Now here is the idea to improve:

Do bogominitrain, remove all tokens which show up only once
in the training body (to do so, full training is needed in
a separate body). Also prevent those tokens from being added
again and do bogominitrain again. Repeat until is converged.

Clearly extremely expensive and I have no real idea how to
implement, but it should give a real powerful database.

pi



More information about the Bogofilter mailing list