training tactics

Chris Wilkes cwilkes-bf at ladro.com
Thu May 19 05:33:29 CEST 2005


On Wed, May 18, 2005 at 08:10:19PM -0700, Kevin Williams wrote:
> My setup currently puts ham and unsure in the regular inbox and spam
> in the junk subfolder for each user.  Every day when I check my mail,
> the spam that gets through to the inbox is moved into the trash folder
> by me, manually through my email client.  And ofcourse, the read mail
> goes into the trash.  I have a cron event that retrains bogo every day
> at around 3am.

I would move the spam that makes it through your filter into a
"makespam" folder so you can treat it differently if you want to, maybe
running it through -s twice or -Ns.

> What are the drawbacks of re-training bogofilter every day like I do? 
> i.e. running with' -s < [junk folder]' and '-n <[trash folder]'.
> 
> Obviously, there is the extra time it takes to parse the entire set of
> spam and ham where the majority has already been read before.  I can
> see how this would not be reccomended for a server with a significant
> number of users but my server has less than 10 users so I don't mind
> this drawback.

The danger I can see is that by constantly training it on the same spam
and ham you're going to set your filter up for classifying new email as
0.500 -- unsure as now bogofilter thinks that spam and ham have to match
what you have exactly.

If you had Maildir mail folders instead of mbox ones you could easily
grab the individual emails that appeared on that day and just train on
those.  What you could do is have all your incoming email saved to your
own inbox and also to a mailfolder called "today"

Then at the end of the day go through all your makespams and then
through your today folder, throwing out those emails that look like
spam.  Then register all the today folder as good and then delete the
folder.

I would also work into your script a way to expire your spams into an
"oldspam" directory after a certain time, and then after more time
delete from the oldspam directory.  It'll save you from having to move
things to trash.

> If there are serious drawbacks to the way I do it now, what are some
> favorable bogo training scenarios as new spam and ham comes in?  I'd
> prefer to have somthing automated and somthing that is doable from
> within the popular mail readers for my users(outlook, horde webmail,
> any imap client really).

Since you're doing all this stuff via a cronjob on the server you
shouldn't have to worry about your email clients.

Chris



More information about the Bogofilter mailing list