training tactics

Kevin Williams netkev at gmail.com
Thu May 19 05:10:19 CEST 2005


My setup currently puts ham and unsure in the regular inbox and spam
in the junk subfolder for each user.  Every day when I check my mail,
the spam that gets through to the inbox is moved into the trash folder
by me, manually through my email client.  And ofcourse, the read mail
goes into the trash.  I have a cron event that retrains bogo every day
at around 3am.

What are the drawbacks of re-training bogofilter every day like I do? 
i.e. running with' -s < [junk folder]' and '-n <[trash folder]'.

Obviously, there is the extra time it takes to parse the entire set of
spam and ham where the majority has already been read before.  I can
see how this would not be reccomended for a server with a significant
number of users but my server has less than 10 users so I don't mind
this drawback.

I did notice somthing in the documentation about keyword ages stored
in the db.  I understand that since i retrain evey day, then the
keywords would always be new(<24 hours).  However, I'm not sure if
there is any drawback to this.

If there are serious drawbacks to the way I do it now, what are some
favorable bogo training scenarios as new spam and ham comes in?  I'd
prefer to have somthing automated and somthing that is doable from
within the popular mail readers for my users(outlook, horde webmail,
any imap client really).



More information about the Bogofilter mailing list