How to deal with extremely high spam levels
Boris 'pi' Piwinger
3.14 at piology.org
Sat Jul 10 10:03:35 EDT 2004
Bob Vincent <bobvin at pillars.net> wrote:
>I'm running bogominitrain once a day to update my database.
I usually do this once I encounter an error. This may be
before a week or later.
>If it doesn't close off in 3 or 4 runs, it's usually a misclassified
It is a good idea to always look at the messages used for
That many runs indicate -- as you say -- a
misclassification. Or it might be due to your very small
number of messages.
>So when that happens, I restore the database, correct the
>error, and re-run bogominitrain.
>Once a week, I delete my database,
Do you mean, you start from scratch?
>register all of my hams (I still
>don't have over 1000 of them), and run bogominitrain again. Then I
>cat all the bogominitrain.spam.* files together, sort by date, and
>overwrite my spam folder with them.
I recommend to keep more messages for training.
>This keeps my ham::spam ratio pretty close to 1::1 and also lets me
That should not be an issue.
>throw away over 90% of my spams without losing training accuracy.
Remember that some messages might later be misclassified due
to additional training.
More information about the Bogofilter