Using Bogofilter with Kmail (my setup)
Nigel Henry
cave.dnb at tiscali.fr
Tue Sep 19 16:44:18 CEST 2006
Seeing a request for someone using Bogofilter with Kmail, this is the setup I
use. I did have some initial problems setting it up, but with help from this
list, and the KDE one, along with a bit of logical thinking, Bogofilter is
doing it's job very well, and I find it a lot faster than SA.
On FC2, I'm using the tarball version-1.0.2, Kmail is 1.6.2 (using KDE
3.2.2-14.FC2.2.legacy Red Hat).
Having downloaded, and installed Bogofilter, I used the following procedure,
and don't recollect having to make any changes to /etc/bogofilter.cf.example,
but I've put the relevant part below, if anyone has any comments on it.
#### CUTOFF Values
#
# both ham_cutoff and spam_cutoff are allowed.
# setting ham_cutoff to a non-zero value will
# enable tri-state results (Spam/Ham/Unsure).
#
#ham_cutoff = 0.45 # default
#spam_cutoff= 0.99 # default
#
# for two-state classification:
#
##ham_cutoff = 0.00 # default
##spam_cutoff = 0.99 # default
Back to the plot.
First create a new directory in your /home/username directory named
.bogofilter . This will contain the wordlist database when you start training
Bogofilter.
Second. Create some new folders in Kmail. Kmail uses maildir format as default
for folders, so you just have to write the newfolder name, and OK each one
created.
1st. 2 folders. "Spam" and "NonSpam" . These 2 are for Bogofilters
initial training purposes, and you'll put a bunch of your good mail in the
NonSpam one, and all the spam that turns up in your inbox in the Spam one
(this in my case was before starting to use Bogofilter, as when you start
filtering the mail, no spam will turn up in the inbox). Try not to populate
the "Spam" folder with a bunch of spam you've DL'd from the Internet. It's
much better to use your own spam thats turned up in your inbox.
Make 3 more folders, named. "spam" , nonspamnew" , and "unsure" . The
spam, and unsure ones are where Bogofilter will filter the spam, and the
stuff that you download, that Bogofilter is unsure about to. All the Ham
will be filtered to the inbox where it belongs.
This is the way I use the various folders that I've created. Having populated
the first 2 (Spam , and NonSpam) with, say, 200 spam mails, and 200 good
mails, I then ran, as user.
bogofilter -sv -B Mail/Spam/cur
This creates the wordlist.db in /home/user/.bogofilter, and adds a load of
words from your spam to the db. Then run.
bogofilter -nv -B Mail/NonSpam/cur
Which will add a load of words from your good e-mails to the database.
I only run these 2 commands once on the "Spam" , and the "NonSpam" folders,
thereafter running them on the "spam" , and "nonspamnew" folders as below,
bogofilter -sv -B Mail/spam/cur
bogofilter -nv -B Mail/nonspamnew/cur
To check the contents of the wordlist.db you can run these 2 commands as user.
bogoutil -w .bogofilter .MSG_COUNT
and
bogoutil -d .bogofilter/wordlist.db
When you have the filters set up for Bogofilter, it will sort the mail, either
to, the inbox, in the case of correctly identified good mail, the "spam"
folder, in the case of correctly identified spam, or the "unsure" folder in
the case of all the stuff that Bogofilter is not sure about.
I continued to run, bogofilter -sv -B Mail/spam/cur , on all the correctly
identified spam for a while to build up the wordlist.db, but have now changed
the filter that sends spam to the "spam" folder, so that it is sent to the
wastebin, as I've never had anything but spam turn up in that folder. Don't
just delete the mail after running the command, but move it to the "Spam"
folder, as it's usefull there, if you should need to recreate the
wordlist.db.
Don't delete the "spam" folder, as we'll now use it in conjunction with the
"nonspamnew" folder, as holding places for when we sift through the "unsure"
folder.
Very little turns up in my "unsure" folder. Mainly it's spam, but sometimes
some wrongly identified good mail. The spam, I move to the "spam" folder, the
good mail, I copy, repeat, copy to the inbox, then move it from the "unsure"
folder to the "nonspamnew" one, so that I can continue training Bogofilter
with it. Then from time to time I run.
bogofilter -sv -B Mail/spam/cur
and
bogofilter -nv -B Mail/nonspamnew/cur
which will update the wordlist.db with all of the stuff that Bogofilter was
unsure about.
Each time you update the wordlist.db with the above 2 commands, empty the
"spam" and the "nonspamnew" folders by moving the stuff in the "spam" folder
to the "Spam" one, and the stuff in the "nonspamnew" one to the "NonSpam"
one, thus leaving the "spam" and "nonspamnew" folders empty, and waiting for
the next lot of stuff you've sifted from the "unsure" folder.
I keep the original "Spam" and " NonSpam" folders as they provide a basis for
creating a new wordlist.db , should it become corrupted, and will contain all
the latest additions.
Looking back over what I've written above, it looks terribly complicated.
Believe me, it isn't. It just looks like that because I've tried not to miss
any of the details.
To easily move mail around in Kmail, I added to the toolbar the item "Move
Message to Folder". Right clicking on this makes it one step less to move
mail around. Also, if you have a bunch of mail in the "spam" folder that you
want to move to the "Spam" one, a CTRL + A will highlight all the mail in the
"spam" folder, and a right click on the highlighted stuff will allow you to
send it in one go to the "Spam" folder.
Bogofilters Filters in Kmail.
If you already have some filters set up in Kmail for sorting personal mail to
specific folders, make sure that they are before the Bogofilter ones. Also of
the Bogofilter ones, make sure that the one named "bogofilter" which pipes
the mail through Bogofilter is the first in line.
I had some initial problems getting the filters set up correctly, but the ones
below that I am using, do the job well.
Use Kmails Settings > configure filters to set them up. To create a new
filter, just click on "new" at the bottom of the filter list, then "rename" ,
and type in the name of the filter. Then just fill in the details for each of
the Bogofilter filters. Click "Apply" after creating each one, and move onto
the next.
Filter 1. bogofilter
Filter criteria:
Match all of the following
<any header> matches regular expr .*
Filter Actions:
remove header X-Bogosity
remove header X-Attachments
pipe through /usr/local/bin/bogofilter -pev
Advanced options:
If this filter matches, stop processing here. (unchecked)
Filter 2. bogofilter_is_spam
Filter Criteria:
Match all of the following
X-Bogosity contains Spam
Filter Actions:
remove header X-Bogosity
remove header X-Attachments
file into folder spam
Advanced Options:
apply this filter to incoming messages (Checked)
on manual filtering (Checked)
If this filter matches, stop processing here (Checked)
Filter 3. bogofilter_is_ham
Filter Criteria:
Match all of the following
X-Bogosity contains Ham
Filter Actions:
remove header X-Bogosity
remove header X-Attachments
file into folder inbox
Advanced Options: (As Filter 2)
Filter 4. bogofilter_is_unsure
Filter Criteria:
Match all of the following
X-Bogosity contains Unsure
Filter Actions:
remove header X-Bogosity
remove header X-Attachments
file into folder unsure
Advanced Options: (As Filter 2)
Now that you have the filters set up, checking that the first Bogofilter one
is in fact in first place, of the Bogofilter ones, close the filter setup
screen, and check the mail.
I hope that this may be of help to anyone using Bogofilter with Kmail.
Perhaps it could be added to the FAQ if thought good enough.
Happy spam filtering.
Nigel.
More information about the Bogofilter
mailing list