Automatic site training

David Relson relson at osagesoftware.com
Wed Feb 9 03:16:25 CET 2005


On Mon, 07 Feb 2005 19:55:44 +0100
Steffen Nissen wrote:

> Hi,
> 
> I have run bogofilter for a while now and have had very good results
> with bogofilter. Unfortunately the harddisk on my server crashed along
> with my bogofilter scripts (I know I should have had a backup, but ...).
> 
> Now I have installed bogofilter again and I thought that it might be a
> good idea to use all the bells and whistles in my new setup.
> 
> My general idea is to have each user create a ham and a spam maildir on
> the server where they place all mail which gets classified wrong in
> these dirs. I will then run a script each week or so which trains on
> these files so that the users will not have to do this by themselves.
> 
> I thought of training with one full training iteration and then a few
> iterations with train-on-error.
> 
> My question is then: Are any of you running a similar setup and what are
> your experiences with this, and do you even think that this is a good
> idea?. Also does anyone have scripts that does something like this.
> 
> On a sidenote I can mention that there will only be a few user on the
> server and that they all have a pretty high level of tech knowledge, so
> there will be no problems with people not being able to copy wrongly
> classified mails to the appropriate folders, and there will neither be
> any problems with training taking too long.
> 
> All comments and suggestions are welcome.
> 
> -- 
> Steffen Nissen
> Project Administrator - Fast Artificial Neural Network Library (fann)
> http://fann.sf.net

Hello Steffen,

Welcome to the list.  Sorry to hear about your mishap :-<

When I started using bogofilter, I created a spam-fixups directory and a
cron job that checked it hourly for "correction" files and ran
bogofilter when it found them.  For example, a file named
"hs.0208.2102.txt" would indicate that on Feb 08 at 21:02 a message was
scored as ham, but should have been scored as spam.  The cronjob would
register this message using flags "-Ns".  Now, with Unsures, the script
looks for the following:

  hs.MMDD.HHMM.txt --- bogofilter -Ns
  sh.MMDD.HHMM.txt --- bogofilter -Sn
  us.MMDD.HHMM.txt --- bogofilter -s
  uh.MMDD.HHMM.txt --- bogofitler -n

Until recently, I would manually create those files whenever bogofilter
made an error or when a message was classified as unsure.  My recent
change was to add procmail recipe and a script that would take out the
manual work.  When procmail sees a message with

  To: userid+spam-fixups at mydomain.com

it runs the script which extracts the subject and puts the original
message in a "spam-fixups" file.  When the hourly cronjob runs, the
wordlist gets updated.

It's one way to get the job done :->

An alternate method is Tom Anderson's perl script, available at
http://orderamidchaos.com/bogofilter/bfproxy.  It's more complex and
capable than my simple homegrown solution.  Either technique will work!

HTH,

David



More information about the Bogofilter mailing list