Site Configuration Queries
David Relson
relson at osagesoftware.com
Mon May 19 05:36:31 CEST 2003
Hello Bryan,
I think your plan is workable.
As you know, bogofilter depends on its wordlists to classify messages as
ham or spam. The accuracy of its classifications depend on the quality and
quantity of the wordlists and that, in turn, is based on the messages used
to train bogofilter. Quality matters in that bogofilter works best when
the ham message it's trained with really are ham and the spam messages are
spam. Quantity matters because the more words it knows, the better it can do.
Using forwarded messages to correct bogofilter's errors is not ideal but
should work. Forwarding changes the header so when "bogofilter -Ns" is run
token count changes aren't right (for some of the header fields), but are
fine for the message body. The tokens in the message bodies will enable
bogofilter to recognize future spam and "bogofilter -u" will train on both
header and body. Over time, bogofilter will learn what is spam and what is
not. The changed headers will make the process take longer.
An alternate approach is to have someone monitor bogofilter's results and
make needed corrections on the server. This is labor intensive, but done
well can produce higher quality results in less time.
David
At 08:25 PM 5/18/03, Bryan Roberts wrote:
>Hi,
>
>I've installed bogofilter and shared the .db files between multiple users
>on a
>Linux box.
>
>I've trained the system with a bunch of ham mailboxes. I don't have any spam
>mailboxes handy.
>
>I've added the following .procmailrc entries for the participating users.
>:0fw
>| bogofilter -u -e -p -l
>
>:0e
>{ EXITCODE=75 HOST }
>
>:0:
>* ^X-Bogosity: Yes, tests=bogofilter
>spam
>
>
>Now I would like setup two mail boxes that reclassify any misclassified mail.
>
>spam at foo.com has the following .procmailrc
>:0HB:
>* ? bogofilter -Ns
>spam
>
>ham at foo.com has the following .procmailrc
>:0HB:
>* ? bogofilter -Sn
>ham
>
>If a user gets a spam in the inbox ideally they just forward it to
>spam at foo.com and the mail is automatically reclassified.
>
>Will this actually work given that the forwarded message has details that
>belong to the user who has forwarded the message. Including an altered
>subject
>line, their email address and possibly a signature?
The forwarded details shouldn't matter too much. Suppose joe at aotea.co.nz
gets 100 messages all from badguy at spam.com. Initially they're all
classified as ham though 5 should have been spam. joe forwards them to
spam at aotea.co.nz and the messages are passed through "bogofilter -Ns". The
counts for "joe" will change from 100/0 (ham/spam) to 95/5, which is what
you want. "badguy" and "spam.com" may still have counts of 100/0
(depending on the headers in the forwarded messages). Future spam from
badguy at spam.com will have some parts that bogofilter considers ham, for
example "badguy" and "spam.com", as well as all the words in the message
body. The more numerous body words will (sooner or later) outweigh the
relatively few header words (like "badguy" and "spam.com") and allow the
new message to be classified as spam. Over time, the spamicity of "badguy"
and "spam.com" will change from 100% ham towards 100% spam.
I hope this has been helpful!
David
>If this won't work, does anyone have a procmail script that would strip the
>forwarding information before passing the message to bogofilter?
>
>Thanks,
> Bryan Roberts
More information about the Bogofilter
mailing list