Training scripts
Matej Cepl
cepl at surfbest.net
Tue Jan 27 07:10:24 CET 2004
May anybody help with writing scripts for using bogofilter?
I have two problems, both with find command (using three-state
filtering with bogofilter trained on error to exhaustion with
bogomintrain.pl scripts):
1) This is my script going through _ham and _spam maildir folders
and retraining bogofilter as necessary:
#!/bin/sh
MAILDIR=$HOME/.mail/
for msg in $MAILDIR/_spam/*/* ; do
formail -I X-Bogosity -I X-KMail -s bogofilter -vs < $msg
STR=$(basename $msg)
mv $msg $MAILDIR/_junk/new/${STR%*:2,S} \
2>&1 >/dev/null
done
for msg in $MAILDIR/_ham/*/* ; do
formail -I X-Bogosity -I X-KMail -s bogofilter -vn < $msg
STR=$(basename $msg)
mv $msg $MAILDIR/inbox/new/${STR%*:2,S} \
2>&1 >/dev/null
done
I would love to replace for cycles with find command, but I do
not know how to pull it off:
a) to run three commands from the same find command and
using the same {} variable twice, and
b) does anybody know how to avoid using STR variable (i.e.,
to use basename directly in ${...%..} bash expression)?
${$(basename $msg)%*:2,S} doesn't work.
2) And other training on find is this script, which collects
corpuses of ham and spam for training from alive data:
#!/bin/sh
TMPBOX=$HOME/mbox.tmp
cat /dev/null > $TMPBOX
find $HOME/Maildir/ $HOME/.mail/ -type f -name 10\* \
\! -iregex $HOME/.\*/_junk.\* \
-exec cat '{}' >> $TMPBOX \;
formail -I "X-Bogosity:" -I "X-KMail" -ds < $TMPBOX >>
$HOME/ham
cat /dev/null > $TMPBOX
find $HOME/.mail/_junk -type f -name 10\* \
-exec cat '{}' >> $TMPBOX \;
formail -I "X-Bogosity:" -I "X-KMail" -ds < $TMPBOX >>
$HOME/spam
rm -f $TMPBOX 2>&1 >/dev/null
unset TMPBOX
Obviously, what I would love to achieve is to get rid of
$TMPBOX and run formail immediately in find -exec. When trying
this:
#!/bin/sh
find $HOME/.mail/_junk -type f -name 10\* \
-exec formail -I "X-Bogosity:" -I "X-KMail" -ds \
< '{}' >> $HOME/spam \;
I get error ``./bogocorpus: line 5: {}: neither file nor \
directory'' (translation to English from localized error
messfge). Any thoughts on this?
Thanks a lot,
Matej
--
Matej Cepl, http://www.ceplovi.cz/matej
GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC
138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488
In political activity men sail a boundless and bottomless sea;
there is neither harbor for shelter nor floor for anchorage,
neither starting point nor appointed destination.
-- Michael Oakeshott: Rationalism in Politics
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040127/eeec085c/attachment.sig>
More information about the Bogofilter
mailing list