passthrough training?

David Relson relson at osagesoftware.com
Fri Mar 5 13:37:33 CET 2004


On Fri, 05 Mar 2004 06:54:37 -0500
Tom Allison wrote:

> David Relson wrote:
> 
> >>:0
> >>* ^FROM_someidiot at somewhere.com
> >>| bogofilter -ps | razor-report
> >>
> >>It comes out STDOUT OK, but will it learn?
> > 
> > 
> > Tom,
> > 
> > Why ask when you can easily give it a try.  Let us know what
> > happens!
> > 
> > David
> 
> Because I'm fundamentally terrified of hosing my wordlist!
> Being somewhat new, it's unfamiliar and therefore something to stay
> away from.  Unknown breeds fear.  I can't be any more blunt than that.

Hi Tom,

With a little care, we can make cure that fear!

The wordlist can be in any directory, for example,

   BOGODIR="new.test.directory"
   mkdir $BOGODIR
   bogofilter -d $BOGODIR -n < /dev/null
   bogofilter -d $BOGODIR -ps < msg.test
   bogoutil   -d $BOGODIR/wordlist.db

With a cron job, you can backup your wordlist.  Using the date command
you can have a rotating backup set.  For example:

   cp $BOGOFILTER_DIR/wordlist.db wordlist.db.`date +%a`

will create files named bogofilter.db.Sun, bogofilter.db.Mon, ...


> So I tried this:
> 
> first, backup wordlist to text file (bogoutil -d)
> 
> bogofilter -pNs < mailfile
> bogofilter -vv <mailfile
> bogofilter -pnS <mailfile
> bogofilter -vv <mailfile
> 
> a few times with different emails, and played with some -vvv options 
> too.  This appears to return the database back to where it started.
> 
> It seems that:
> 1) -p works very nicely with these other options to teach/correct
> 2) I'm slowly learning about the Care and Feeding of bogofilter.

Good.  It takes a while :-)

> I also made the wonderful mistake of accidentally changing my min_devs
> 
> such that robx was outside this window.  The resulting amount of
> unsure mail was intense.
> But it taught me how well (and why) this works.  The difference was 
> astounding the main robx contributions were the bayesian gibberish
> they so love to dump upon us.

As long as you don't direct spam to /dev/null, experimentation is safe.
When I started using bogofilter, I too was unsure.  The first thing my
procmailrc does is copy each incoming message to file mail.backup.  Each
day at 23:59, a cron job moves mail.backup to mail.`date +%m%d`.  That
has resulted in an archive of all incoming mail since I started running
bogofilter.  (Actually, "mail.backup" is step 2.  Step 1 is to send
copies of W32.Swen to /dev/null, which has saved some hundreds of MB).

David




More information about the Bogofilter mailing list