Bogofilter seems to not be working
David Relson
relson at osagesoftware.com
Tue Mar 25 22:42:48 CET 2003
At 04:15 PM 3/25/03, daniel wrote:
>So wrote David Relson on Tuesday 25 March 2003 at 04:03:59PM -0500:
>
>Sorry to sound so dense, yet I am using Mutt with standard mbox mail
>folders. Therefore each message is just a string of text in a big
>file. How can I run bogofilter on a particular message from the command
>line like this (I have tried piping to shell command from withing Mutt via
>the ! command yet this does not pipe the message)?
Daniel,
You don't sound dense; merely new to bogofilter (and perhaps mail
processing). Given messages in mbox format, they need to be separated out
for testing. If you want to process all the messages in an mbox, use
"formail", as in:
formail -s bogofilter -v -d ~/.bogofilter < mbox
If you want just one message, you'll have to use your favorite editor.
>Here is a listing:
> bogoutil -p -w ~/.bogofilter mortgage investment
> spam good Gra prob Rob prob
>mortgage 12 15 0.651841 0.649409
>investment 1 2 0.400000 0.527607
This shows "mortgage" as spam with (roughly) a 12/27 score and "investment"
with a "1/3" score. The counts are not consistant with your expecting them
to be "strong indicators". Are you using "tag_header_lines"? If so, use
of these words in Subject lines would correspond to tokens "subj:mortgage"
and "subj:investment".
>both these words should be strong indicators of spam since I do not think
>I have any good e-mails that contain either word.
If these words aren't in good e-mails, then their "good" counts should be zero.
>What concerns me is that the good MSG COUNT is incrementing by one when
>mail is retreived even when the mail is spam. So if I do a <esc>-d on it
>the spam MSG COUNT increments by one yet that message has already been
>registered on the good COUNT. This seems self-defeating. It seems to me
>that <esc>-d should not only increment the spam COUNT by one and add all
>words in the given message to the bad list, it also should remove it from
>the good COUNT and all its words.
When using "-u", the spam message count should increase when the message is
spam and the good message count should increase when the message is not
spam. If your MSG COUNT always increases, then you have something set up
incorrectly.
As I don't use mutt, I can't help you with those details. Someone else
will have to chime in concerning <esc>-d.
>Also, I just removed the "tests=bogofilter" string from the procmail
>testing line and also the -u option but neither of these has had any
>effect. Spamicity is still 0.000000.
"-u" updates the appropriate wordlist after classifying a
message. Removing it has no effect on the score of the current
message. The combination of "-u -l" will generate logging messages that
will give you a record of how bogofilter has scored incoming mail and of
the wordlist updated. I've found "-u -l" to be very useful as a combination.
More information about the Bogofilter
mailing list