Site Configuration Queries

David Relson relson at osagesoftware.com
Mon May 19 05:36:31 CEST 2003


Hello Bryan,

I think your plan is workable.

As you know, bogofilter depends on its wordlists to classify messages as 
ham or spam.  The accuracy of its classifications depend on the quality and 
quantity of the wordlists and that, in turn, is based on the messages used 
to train bogofilter.  Quality matters in that bogofilter works best when 
the ham message it's trained with really are ham and the spam messages are 
spam.  Quantity matters because the more words it knows, the better it can do.

Using forwarded messages to correct bogofilter's errors is not ideal but 
should work.  Forwarding changes the header so when "bogofilter -Ns" is run 
token count changes aren't right (for some of the header fields), but are 
fine for the message body.  The tokens in the message bodies will enable 
bogofilter to recognize future spam and "bogofilter -u" will train on both 
header and body.  Over time, bogofilter will learn what is spam and what is 
not.  The changed headers will make the process take longer.

An alternate approach is to have someone monitor bogofilter's results and 
make needed corrections on the server.  This is labor intensive, but done 
well can produce higher quality results in less time.

David

At 08:25 PM 5/18/03, Bryan Roberts wrote:

>Hi,
>
>I've installed bogofilter and shared the .db files between multiple users 
>on a
>Linux box.
>
>I've trained the system with a bunch of ham mailboxes. I don't have any spam
>mailboxes handy.
>
>I've added the following .procmailrc entries for the participating users.
>:0fw
>| bogofilter -u -e -p -l
>
>:0e
>{ EXITCODE=75 HOST }
>
>:0:
>* ^X-Bogosity: Yes, tests=bogofilter
>spam
>
>
>Now I would like setup two mail boxes that reclassify any misclassified mail.
>
>spam at foo.com has the following .procmailrc
>:0HB:
>* ? bogofilter -Ns
>spam
>
>ham at foo.com has the following .procmailrc
>:0HB:
>* ? bogofilter -Sn
>ham
>
>If a user gets a spam in the inbox ideally they just forward it to
>spam at foo.com and the mail is automatically reclassified.
>
>Will this actually work given that the forwarded message has details that
>belong to the user who has forwarded the message. Including an altered 
>subject
>line, their email address and possibly a signature?

The forwarded details shouldn't matter too much.  Suppose joe at aotea.co.nz 
gets 100 messages all from badguy at spam.com.  Initially they're all 
classified as ham though 5 should have been spam.  joe forwards them to 
spam at aotea.co.nz and the messages are passed through "bogofilter -Ns".  The 
counts for "joe" will change from 100/0 (ham/spam) to 95/5, which is what 
you want.  "badguy" and "spam.com" may still have counts of 100/0 
(depending on the headers in the forwarded messages).  Future spam from 
badguy at spam.com will have some parts that bogofilter considers ham, for 
example "badguy" and "spam.com", as well as all the words in the message 
body.  The more numerous body words will (sooner or later) outweigh the 
relatively few header words (like "badguy" and "spam.com") and allow the 
new message to be classified as spam.  Over time, the spamicity of "badguy" 
and "spam.com" will change from 100% ham towards 100% spam.

I hope this has been helpful!

David

>If this won't work, does anyone have a procmail script that would strip the
>forwarding information before passing the message to bogofilter?
>
>Thanks,
>    Bryan Roberts





More information about the Bogofilter mailing list