bogolearn

Zack Brown zbrown at tumblerings.org
Mon Apr 7 17:25:24 CEST 2003


On Sat, Apr 05, 2003 at 02:29:42PM -0500, David Relson wrote:
> At 02:21 PM 4/5/03, Kevin McKinley wrote:
> 
> >I found this "bogolearn" script at O'Reilly. (I changed "GOOD" to "HAM"). I
> >offer it here for comment, suggested improvements, or whatever.

I use a similar script, but I set mine to delete the files in ~/.bogofilter
before starting, so the whole database is constructed with the same version
of bogofilter. That seems like a good precaution, since bugs can creep in
otherwise. Every few months, I regenerate the database this way, and it
seems to cut down on false negatives.

This requires that I keep an ever growing spam archive lying around.
It's up to 450 megs by now, including the initial archive I downloaded
off the web.

I also only run bogofilter on email that does not get snagged for
mailing list folders. So spam still goes into my mailing list folders. I
just have too many folders, I can't go through them all the time looking
for false negatives. And if I let bogofilter train on that email, it
would gradually degrade its quality. True, I regenerate everything once
in awhile anyway, but I imagine I'll taper off on that as bogofilter
matures.

I thought of putting bogofilter in my procmailrc file twice, once at the top,
where it would catch spam but not train itself, and once at the bottom after
all my mailing list recipes, where it would catch spam *and* train, before
sending the remaining mail to my inbox. But I haven't done it yet. Maybe
I'd rather put up with more false negatives in my mailing list folders,
than false positives in the spam folder.

Be well,
Zack

> >
> >Kevin
> >
> >#!/bin/sh
> >
> >BOGOFILTER="/usr/bin/bogofilter";
> >HAMDIR="/path/to/ham";
> >SPAMDIR="/path/to/spam";
> >
> >cd $SPAMDIR;
> >echo Spam:
> >for i in *;
> >do
> >   echo Processing Mail ID \#$i;
> >   bogofilter -s -v < $i ;
> >done;
> >
> >cd $HAMDIR;
> >echo Ham:
> >for i in *;
> >do
> >   echo Processing Mail ID \#$i;
> >   bogofilter -n -v < $i ;
> >done;
> 
> Two comments:  way too many semicolons; could be simplified as:
> 
> #!/bin/sh
> 
> BOGOFILTER="/usr/bin/bogofilter";
> HAMDIR="/path/to/ham";
> SPAMDIR="/path/to/spam";
> 
> 
> echo Spam:
> for i in $SPAMDIR/*;
> do
>    echo Processing Mail ID \#$i
>    bogofilter -s -v < $i
> done
> 
> echo Ham:
> for i in $HAMDIR/*; do
>    echo Processing Mail ID \#$i
>    bogofilter -n -v < $i
> done
> 
> Also, the two lines of output for each message could be combined on a 
> single line, i.e.
> 
> echo -n $i
> bogofilter -s -v < $i
> 
> 
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com
> 

-- 
Zack Brown




More information about the Bogofilter mailing list