Training scripts

Matej Cepl cepl at surfbest.net
Tue Jan 27 23:40:26 CET 2004


On Tuesday 27 of January 2004 10:06, Stroller wrote:
> I've posted this script for training before: 
> <http://article.gmane.org/gmane.mail.bogofilter.general/5935>
> See the attachment link at the bottom - I hope you'll maybe 
find it 
> useful with respect to the find command.

I'll check it out.

> I'm pretty new to Bash scripting myself, so I'm having some 
problems 
> reading your scripts. I'm posting not because I'm an expert, 
but 
> because i welcome and discussion & enlightenment on the 
subject.
> I'm a little unclear why you appear to be calling KMail, for 
instance, 
> and your use of the `formail` command suggests to me you're 
doing 
> something cleverer than I.

I am not calling KMail at all, just getting rid of some 
additional email headers put there by bogofilter and KMail.

> All my script does is train Bogofilter on new messages in spam 
& ham 
> (maildir) folders respectively.

I am doing here train-on-error only, so it is slightly different.

> What I've realised, however, is that if I run my script then 
move a 
> message, say, from my inbox (which is ignored in case it has 
spam in 
> it) to a saved items folder, then subsequent runs of my script 
will not 
> train on that message.

You are sure to know by heart these two pages, aren't you?
http://cr.yp.to/proto/maildir.html
http://www.qmail.org/man/man5/maildir.html

>However I found that using `find... -print0 | xargs` was 
>*considerably* faster than `find... -exec bogofilter -s -W -v -I 
>\{\} \; `, which calls for Bogofilter to be repeatedly restarted 
>with each message. IIRC using that latter method took about 20 
>or 40 minutes to build a database based on my modest message 
>corpus; with the script the way it is I can move older messages 
>around & completely rebuild my database from scratch (by 
>removing the old one) in less than 5 minutes.

I am retraining just around 10 messages a day in one run, so the 
speed is not so much issue for me.

Matej

-- 
Matej Cepl, http://www.ceplovi.cz/matej
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488
 
Science is meaningless because it gives no answer to our
question, the only question important to us: ``What shall we do
and how shall we live?''
    -- Lev Nikolaevich Tolstoy


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040127/2997d461/attachment.sig>


More information about the Bogofilter mailing list