[bogofilter] Filter twice: Global wordlist, then small personal wordlist
cfortune at telus.net
Fri Apr 23 17:39:53 EDT 2004
In the never ending quest to provide perfect spam filtering at low cost.........
I now have a global wordlist that filters most spam for most users. Great! Thanks, guys.
I would like users to be able to train it, but some users have proven themselves to be horribly unreliable in their judgements. So,
rather than create some sort of "user reliability quotient (R.Q.)" and put out their brush fires, I would like them to each have
their own wordlists. That way if they screw up their re-classification, they only bodge their own mail. I also hope that these
personal wordlists will be small and light, and hopefully dead accurate.
my Question: the personal wordlist will begin with just a few mails registered. When is it safe to use it for classification? How
many emails must be registered before it is stable for one person's mail?
If the personal wordlist message-count is imbalanced, more spam than ham, what is the significance of this?
Later on, I can assess each personal wordlist for accuracy. Can a small wordlist be merged with a larger one?
Idealistic question: is it possible to temporarily merge two wordlists in memory, like this imaginary command:
bogofilter -d/path/to/global.wordlist.db -d/path/to/user.wordlist.db --merge < test.eml
More information about the Bogofilter