How to deal with extremely high spam levels

Boris 'pi' Piwinger 3.14 at piology.org
Sat Jul 10 10:03:35 EDT 2004


Bob Vincent <bobvin at pillars.net> wrote:

>I'm running bogominitrain once a day to update my database.

I usually do this once I encounter an error. This may be
before a week or later.

>If it doesn't close off in 3 or 4 runs, it's usually a misclassified
>message.  

It is a good idea to always look at the messages used for
training.

That many runs indicate -- as you say -- a
misclassification. Or it might be due to your very small
number of messages.

>So when that happens, I restore the database, correct the
>error, and re-run bogominitrain.

Perfect.

>Once a week, I delete my database, 

Do you mean, you start from scratch?

>register all of my hams (I still
>don't have over 1000 of them), and run bogominitrain again.  Then I
>cat all the bogominitrain.spam.* files together, sort by date, and
>overwrite my spam folder with them.

I recommend to keep more messages for training.

>This keeps my ham::spam ratio pretty close to 1::1 and also lets me

That should not be an issue.

>throw away over 90% of my spams without losing training accuracy.

Remember that some messages might later be misclassified due
to additional training.

pi


More information about the Bogofilter mailing list