Using Bogofilter with Kmail (my setup)

Nigel Henry cave.dnb at tiscali.fr
Tue Sep 19 16:44:18 CEST 2006


Seeing a request for someone using Bogofilter with Kmail, this is the setup I 
use.  I did have some initial problems setting it up, but with help from this 
list, and the KDE one, along with a bit of logical thinking, Bogofilter is 
doing it's job very well, and I find it a lot faster than SA.

On FC2, I'm using the tarball version-1.0.2, Kmail is 1.6.2 (using KDE 
3.2.2-14.FC2.2.legacy Red Hat).

Having downloaded, and installed Bogofilter, I used the following procedure, 
and don't recollect having to make any changes to /etc/bogofilter.cf.example, 
but I've put the relevant part below, if anyone has any comments on it.

#### CUTOFF Values
#
# both ham_cutoff and spam_cutoff are allowed.
# setting ham_cutoff to a non-zero value will
# enable tri-state results (Spam/Ham/Unsure).
#
#ham_cutoff = 0.45   # default
#spam_cutoff= 0.99   # default
#
# for two-state classification:
#
##ham_cutoff  = 0.00   # default
##spam_cutoff = 0.99   # default

Back to the plot.

First create a new directory in your /home/username directory named
.bogofilter . This will contain the wordlist database when you start training 
Bogofilter.

Second. Create some new folders in Kmail. Kmail uses maildir format as default 
for folders, so you just have to write the newfolder name, and OK each one 
created.  
1st. 2 folders.  "Spam"  and  "NonSpam"  .  These 2 are for Bogofilters 
initial training purposes, and you'll put  a bunch of your good mail in the 
NonSpam one, and all the spam that turns up in your inbox in the Spam one 
(this in my case was before starting to use Bogofilter, as when you start 
filtering the mail, no spam will turn up in the inbox). Try not to populate 
the "Spam" folder with a bunch of spam you've DL'd from the Internet. It's 
much better to use your own spam thats turned up in your inbox.

Make 3 more folders, named.  "spam"  ,  nonspamnew" , and "unsure" .  The 
spam, and unsure ones are where Bogofilter will filter the spam, and the 
stuff that you download, that Bogofilter is unsure about to.  All the Ham 
will be filtered to the inbox where it belongs.

This is the way I use the various folders that I've created. Having populated 
the first 2 (Spam , and NonSpam)  with, say, 200 spam mails, and 200 good 
mails, I then ran, as user.
bogofilter -sv -B Mail/Spam/cur
This creates the wordlist.db in /home/user/.bogofilter, and adds a load of 
words from your spam to the db.  Then run.
bogofilter -nv -B Mail/NonSpam/cur
Which will add a load of words from your good e-mails to the database.

I only run these 2 commands once on the "Spam" , and the "NonSpam" folders, 
thereafter running them on the "spam" , and "nonspamnew" folders as below,
bogofilter -sv -B Mail/spam/cur
bogofilter -nv -B Mail/nonspamnew/cur

To check the contents of the wordlist.db you can run these 2 commands as user.
bogoutil -w .bogofilter .MSG_COUNT
and
bogoutil -d .bogofilter/wordlist.db

When you have the filters set up for Bogofilter, it will sort the mail, either 
to, the inbox, in the case of correctly identified good mail, the "spam" 
folder, in the case of correctly identified spam, or the "unsure" folder in 
the case of all the stuff that Bogofilter is not sure about.

I continued to run, bogofilter -sv -B Mail/spam/cur , on all the correctly 
identified spam for a while to build up the wordlist.db, but have now changed 
the filter that sends spam to the "spam" folder, so that it is sent to the 
wastebin, as I've never had anything but spam turn up in that folder. Don't 
just delete the mail after running the command, but move it to the "Spam" 
folder, as it's usefull there, if you should need to recreate the 
wordlist.db.

Don't delete the "spam" folder, as we'll now use it in conjunction with the 
"nonspamnew" folder, as holding places for when we sift through the "unsure" 
folder.

Very little turns up in my "unsure" folder. Mainly it's spam, but sometimes 
some wrongly identified good mail. The spam, I move to the "spam" folder, the 
good mail, I copy, repeat, copy to the inbox, then move it from the "unsure" 
folder to the "nonspamnew" one, so that I can continue training Bogofilter 
with it. Then from time to time I run.
bogofilter -sv -B Mail/spam/cur
and
bogofilter -nv -B Mail/nonspamnew/cur
which will update the wordlist.db  with all of the stuff that Bogofilter was 
unsure about.

Each time you update the wordlist.db with the above 2 commands, empty the 
"spam" and the "nonspamnew" folders by moving the stuff in the "spam" folder 
to the "Spam" one, and the stuff in the "nonspamnew" one to the "NonSpam" 
one, thus leaving the "spam" and "nonspamnew" folders empty, and waiting for 
the next lot of stuff you've sifted from the "unsure" folder.

I keep the original "Spam" and " NonSpam" folders as they provide a basis for 
creating a new wordlist.db , should it become corrupted, and will contain all 
the latest additions.

Looking back over what I've written above, it looks terribly complicated. 
Believe me, it isn't. It just looks like that because I've tried not to miss 
any of the details.

To easily move mail around in Kmail, I added to the toolbar the item "Move 
Message to Folder". Right clicking on this makes it one step less to move 
mail around.  Also, if you have a bunch of mail in the "spam" folder that you 
want to move to the "Spam" one, a CTRL + A will highlight all the mail in the 
"spam" folder, and a right click on the highlighted stuff will allow you to 
send it in one go to the "Spam" folder.

Bogofilters Filters in Kmail.

If you already have some filters set up in Kmail for sorting personal mail to 
specific folders, make sure that they are before the Bogofilter ones. Also of 
the Bogofilter ones, make sure that the one named "bogofilter" which pipes 
the mail through Bogofilter is the first in line.

I had some initial problems getting the filters set up correctly, but the ones 
below that I am using, do the job well.

Use Kmails Settings > configure filters  to set them up.  To create a new 
filter, just click on "new" at the bottom of the filter list, then "rename" , 
and type in the name of the filter. Then just fill in the details for each of 
the Bogofilter filters. Click "Apply" after creating each one, and move onto 
the next.


Filter 1.  bogofilter
Filter criteria:
Match all of the following
<any header>         matches regular expr                 .*

Filter Actions:
remove header                 X-Bogosity
remove header                 X-Attachments
pipe through                     /usr/local/bin/bogofilter -pev

Advanced options:
If this filter matches, stop processing here. (unchecked)

Filter 2.  bogofilter_is_spam
Filter Criteria:
Match all of the following
X-Bogosity                contains                 Spam

Filter Actions:
remove header               X-Bogosity
remove header               X-Attachments
file into folder                   spam

Advanced Options:
apply this filter to incoming messages    (Checked)
on manual filtering                                     (Checked)
If this filter matches, stop processing here    (Checked)

Filter 3.    bogofilter_is_ham
Filter Criteria:
Match all of the following
X-Bogosity                    contains                      Ham

Filter Actions:
remove header             X-Bogosity
remove header             X-Attachments
file into folder                 inbox

Advanced Options:   (As Filter 2)

Filter 4.       bogofilter_is_unsure
Filter Criteria:
Match all of the following
X-Bogosity                      contains                     Unsure

Filter Actions:
remove header                X-Bogosity
remove header                X-Attachments
file into folder                    unsure

Advanced Options:    (As Filter 2)

Now that you have the filters set up, checking that the first Bogofilter one 
is in fact in first place, of the Bogofilter ones, close the filter setup 
screen, and check the mail. 

I hope that this may be of help to anyone using Bogofilter with Kmail.

Perhaps it could be added to the FAQ if thought good enough.

Happy spam filtering.

Nigel.












More information about the Bogofilter mailing list