Request for FAQ contributions

R Kimber rkimber at ntlworld.com
Wed Mar 16 18:16:13 CET 2005


On Mon, 7 Mar 2005 20:02:20 -0500
David Relson <relson at osagesoftware.com> wrote:

> Hi y'all,
> 
> It's been suggested that bogofilter's FAQ ought to have info on using
> bogofilter with the popular MUA's.  I know we've got people running
> bogofilter with/from mutt, sylpheed-claws, kmail, thunderbird, etc.
> If any of you would care to answer the question "How do I use
> bogofilter with X?", I'll add the info to the FAQ.

Hi:

David has asked for information about people's use of Bogofilter for the
FAQ.

Here is a description of what I do. I'm not an expert. I've included an
outline of the whole system, and not just the bogofilter part, because
I think Bogofilter is part of a system and it might help someone to see
the whole system. 


Bogofilter and Sylpheed

I use:

fetchmail/mailfilter
procmail
bogofilter
sylpheed

###############################
# Configuration created Fri Aug  2 17:14:15 2002 by fetchmailconf
set postmaster "postmaster"
set daemon 180
set no bouncemail
set no spambounce
# set syslog
set properties ""
poll <ISP pop server address> with proto POP3
      user '<ISP username>' there with password '<password>' is
'<local username>' here options       antispam -1 
      mda "formail -s procmail"
      preconnect "mailfilter"
###############################

Mailfilter filters out a few obvious microsoft-type problems with
standard rules such as:

DENY=^Content-(Type|Disposition):.*(file)?name=.*\.(asd|bat|chm|cmd|com|
dll|exe|hlp|hta|js|jse|lnk|ocx|pif|scr|shb|shm|shs|vb|vbe|vbs|vbx|vxd|wav|wsf|wsh)

Downloaded mail then gets passed to procmail.
Basically the procmail recipes filter out two kinds of messages.

1. Some stuff I don't want under any circumstances goes to /dev/null

2. The stuff I know I want gets delivered directly to the relevant
Sylpheed folder. In the case of some mailing lists I have a recipe that
ensures that any reply I make actually goes to the list and not to the
individual. For example: 
:0f * ^(To|Cc):.*amd64
| /usr/bin/formail -bfi "Reply-To:suse-amd64 at suse.com"
:0 a:
<path to sylpheed folder>/Suse/amd64/.
:0e
{ EXITCODE=75
    HOST }


3. The rest goes through bogofilter:
:0fw
|/usr/local/bin/bogofilter -u -e -p
:0e
{ EXITCODE=75
    HOST }

To make life simple, when I test for spam I divide it into two
categories using a crude empirically-determined recipe that picks out
the unreadable spam.  That way I can more easily and quickly glance
down the spam list to check for false positives.  There are, no doubt,
better ways of doing it:

# test for spam
:0:
* ^X-Bogosity: Spam
{
   :0
   * ^Subject:.*[<unwanted characters put here>]
   <path to sylpheed folder>/unreadable/.
   :0
   <path to sylpheed folder>/spam/.
}
:0e
{ EXITCODE=75
    HOST }

Then I filter out the uncertains
:0:
* ^X-Bogosity: Unsure
<path to sylpheed folder>/UNCERTAIN/.
:0e
{ EXITCODE=75
    HOST }

and finally everything else goes to my sylpheed inbox:
:0
<path to sylpheed folder>/inbox/.
:0e
{ EXITCODE=75
    HOST }

Sylpheed refreshes its folders each time it checks for mail, so I have
it set to autocheck at frequent intervals and the new mail appears.
The one disadvantage of this is that you cannot then use Sylpheed's
built-in filter system because this is by-passed by the procmail
delivery to the folders. That means I cannot have messages
automatically color-coded, though I could invoke the filter system by
hand if I wanted.  I prefer the power of procmail.

When I discover a false positive, false negative, or an uncertain mail,
I run one of two actions.  For false negatives and uncertains that are
spam I have an action

/usr/local/bin/bogtospam %f

this script
(a) unlearns the message if it has wrongly been classified as ham
(b) it relearns the message as spam. Learning takes place up to four
times, since I understood from an investigation by Greg that there were
no real benefits beyond that.
(c) it then moves the message to a spam
archive, but it has to create a unique filename so that it isn't
overwriting an existing message.

The script is:-
========================
#!/bin/sh
# bogtospam
# This unlearns wrongly classified "ham" messages
# learns "unsure" messages, or the unlearnt wrongly classified "ham"
# messages, as "spam"
# if necessary up to 4 times
# (http://www.bgl.nu/bogofilter/reptrain.html) and moves the message to
# the spam archive
FILE=$1
# print filename
echo "file=$FILE"
# print the X-Bogosity line
grep spamicity $FILE
# Now unlearn it if it was classified as ham
CHECK=`/usr/local/bin/bogofilter -T < $FILE`
# char counting in strings starts at 0
BIT=${CHECK:0:1}
if [ $BIT = "H" ]
then
  /usr/local/bin/bogofilter -N < $FILE
  CHECK=`/usr/local/bin/bogofilter -T < $FILE`
  echo "classification after unlearning [-N]: $CHECK"
fi
# now relearn as spam
CHECK=`/usr/local/bin/bogofilter -T < $FILE`
BIT=${CHECK:0:1}
if [ $BIT = "U" ] || [ $BIT = "H" ]
then
  let num=0
  while [ $num -lt 4 ]
  do
    let num=$num+1
    CHECK=`/usr/local/bin/bogofilter -T < $FILE`
    echo "[$num]: $CHECK"
    BIT=${CHECK:0:1}
    if [ $BIT = "U" ] || [ $BIT = "H" ]
    then
      /usr/local/bin/bogofilter -s -e < $FILE
      if [ $? -gt 0 ]
      then
        echo "Problem: retrained message not relearned"
        exit
      fi
    else
      break
    fi
  done
  if [ $num -eq 1 ]
  then
    echo "message relearned once"
  elif [ $num -eq 2 ]
  then
    echo "message relearned twice"
  else
    echo "message relearned $num times"
  fi
fi
echo "Final classification = `/usr/local/bin/bogofilter -T < $FILE`"
# now archive it
F1=`basename $FILE`
F2=`mktemp <path to spam archive>/spam/$F1.XXXXXX`
mv $FILE $F2
if [ $? -eq 0 ]
then
  echo "moved $FILE to $F2"
else
  echo "Problem: message not moved"
fi
===============================

With unsures that are ham, and with false positives, I use bogtoham.
This (a) unlearns the false positive
(b) learns the message (up to four times)
(c) copies the message to an archive of ham
(d) moves the message to my inbox for attention

===============================
#!/bin/sh
# bogtoham
# This unlearns wrongly classified "spam" messages, learns "unsure"
# messages, or wrongly classified "spam" messages, as "ham"
# if necessary up to 4 times
# (http://www.bgl.nu/bogofilter/reptrain.html) and moves the message to
# the spam archive
FILE=$1
# print filename
echo "file=$FILE"
# print the X-Bogosity line
grep spamicity $FILE
# Now unlearn it if it was classified as spam and relearn as ham
CHECK=`/usr/local/bin/bogofilter -T < $FILE`
# NB: char counting in strings starts at 0
BIT=${CHECK:0:1}
if [ $BIT = "S" ]
then
  /usr/local/bin/bogofilter -S < $FILE
  CHECK=`/usr/local/bin/bogofilter -T < $FILE`
  echo "classification after unlearning [-S]: $CHECK"
fi
# now relearn as spam
CHECK=`/usr/local/bin/bogofilter -T < $FILE`
BIT=${CHECK:0:1}
if [ $BIT = "U" ] || [ $BIT = "S" ]
then
  let num=0
  while [ $num -lt 4 ]
  do
    let num=$num+1
    CHECK=`/usr/local/bin/bogofilter -T < $FILE`
    BIT=${CHECK:0:1}
    echo $CHECK
    if [ $BIT = "U" ] || [ $BIT = "S" ]
    then
      /usr/local/bin/bogofilter -n -e < $FILE
      if [ $? -gt 0 ]
      then
        echo "Problem: retrained message not relearned"
        exit
      fi
    else
      break
    fi
  done
  if [ $num -eq 1 ]
  then
    echo "message relearned once"
  elif [ $num -eq 2 ]
  then
    echo "message relearned twice"
  else
    echo "message relearned $num times"
  fi
fi
echo "Final classification = `/usr/local/bin/bogofilter -T < $FILE`"
# now archive the file
F1=`basename $FILE`
F2=`mktemp <path to archive>/goodmail/$F1.XXXXXX`
cp $FILE $F2
if [ $? -eq 0 ]
then
  echo "copied $FILE to $F2"
else
  echo "Problem: message not copied"
fi
# now move to MH inbox for attention
let name=`ls <path to sylpheed folders>/inbox |sort -n|tail -1`
let name=$name+1
mv $FILE <path to sylpheed folders>/inbox/$name
================================

I hope this helps someone.  If I'm doing things that are sub-optimal or
just plain wrong, I'd be pleased to hear about it.  It all seems to
work very well, thanks to David and all the other developers who have
made Bogofilter what it is.

- Richard.
-- 
Richard Kimber
http://www.psr.keele.ac.uk/
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list