passthrough training?
David Relson
relson at osagesoftware.com
Fri Mar 5 13:37:33 CET 2004
On Fri, 05 Mar 2004 06:54:37 -0500
Tom Allison wrote:
> David Relson wrote:
>
> >>:0
> >>* ^FROM_someidiot at somewhere.com
> >>| bogofilter -ps | razor-report
> >>
> >>It comes out STDOUT OK, but will it learn?
> >
> >
> > Tom,
> >
> > Why ask when you can easily give it a try. Let us know what
> > happens!
> >
> > David
>
> Because I'm fundamentally terrified of hosing my wordlist!
> Being somewhat new, it's unfamiliar and therefore something to stay
> away from. Unknown breeds fear. I can't be any more blunt than that.
Hi Tom,
With a little care, we can make cure that fear!
The wordlist can be in any directory, for example,
BOGODIR="new.test.directory"
mkdir $BOGODIR
bogofilter -d $BOGODIR -n < /dev/null
bogofilter -d $BOGODIR -ps < msg.test
bogoutil -d $BOGODIR/wordlist.db
With a cron job, you can backup your wordlist. Using the date command
you can have a rotating backup set. For example:
cp $BOGOFILTER_DIR/wordlist.db wordlist.db.`date +%a`
will create files named bogofilter.db.Sun, bogofilter.db.Mon, ...
> So I tried this:
>
> first, backup wordlist to text file (bogoutil -d)
>
> bogofilter -pNs < mailfile
> bogofilter -vv <mailfile
> bogofilter -pnS <mailfile
> bogofilter -vv <mailfile
>
> a few times with different emails, and played with some -vvv options
> too. This appears to return the database back to where it started.
>
> It seems that:
> 1) -p works very nicely with these other options to teach/correct
> 2) I'm slowly learning about the Care and Feeding of bogofilter.
Good. It takes a while :-)
> I also made the wonderful mistake of accidentally changing my min_devs
>
> such that robx was outside this window. The resulting amount of
> unsure mail was intense.
> But it taught me how well (and why) this works. The difference was
> astounding the main robx contributions were the bayesian gibberish
> they so love to dump upon us.
As long as you don't direct spam to /dev/null, experimentation is safe.
When I started using bogofilter, I too was unsure. The first thing my
procmailrc does is copy each incoming message to file mail.backup. Each
day at 23:59, a cron job moves mail.backup to mail.`date +%m%d`. That
has resulted in an archive of all incoming mail since I started running
bogofilter. (Actually, "mail.backup" is step 2. Step 1 is to send
copies of W32.Swen to /dev/null, which has saved some hundreds of MB).
David
More information about the Bogofilter
mailing list