Bogofilter seems to not be working

David Relson relson at osagesoftware.com
Tue Mar 25 22:42:48 CET 2003


At 04:15 PM 3/25/03, daniel wrote:

>So wrote David Relson on Tuesday 25 March 2003 at 04:03:59PM -0500:
>
>Sorry to sound so dense, yet I am using Mutt with standard mbox mail 
>folders.  Therefore each message is just a string of text in a big 
>file.  How can I run bogofilter on a particular message from the command 
>line like this (I have tried piping to shell command from withing Mutt via 
>the ! command yet this does not pipe the message)?

Daniel,

You don't sound dense; merely new to bogofilter (and perhaps mail 
processing).  Given messages in mbox format, they need to be separated out 
for testing.  If you want to process all the messages in an mbox, use 
"formail", as in:

         formail -s bogofilter -v -d ~/.bogofilter < mbox

If you want just one message, you'll have to use your favorite editor.


>Here is a listing:
>  bogoutil -p -w ~/.bogofilter mortgage investment
>                        spam    good  Gra prob  Rob prob
>mortgage                 12      15  0.651841  0.649409
>investment                1       2  0.400000  0.527607

This shows "mortgage" as spam with (roughly) a 12/27 score and "investment" 
with a "1/3" score.  The counts are not consistant with your expecting them 
to be "strong indicators".  Are you using "tag_header_lines"?  If so, use 
of these words in Subject lines would correspond to tokens "subj:mortgage" 
and "subj:investment".

>both these words should be strong indicators of spam since I do not think 
>I have any good e-mails that contain either word.

If these words aren't in good e-mails, then their "good" counts should be zero.

>What concerns me is that the good MSG COUNT is incrementing by one when 
>mail is retreived even when the mail is spam.  So if I do a <esc>-d on it 
>the spam MSG COUNT increments by one yet that message has already been 
>registered on the good COUNT.  This seems self-defeating.  It seems to me 
>that <esc>-d should not only increment the spam COUNT by one and add all 
>words in the given message to the bad list, it also should remove it from 
>the good COUNT and all its words.

When using "-u", the spam message count should increase when the message is 
spam and the good message count should increase when the message is not 
spam.  If your MSG COUNT always increases, then you have something set up 
incorrectly.

As I don't use mutt, I can't help you with those details.  Someone else 
will have to chime in concerning <esc>-d.

>Also, I just removed the "tests=bogofilter" string from the procmail 
>testing line and also the -u option but neither of these has had any 
>effect.  Spamicity is still 0.000000.

"-u" updates the appropriate wordlist after classifying a 
message.  Removing it has no effect on the score of the current 
message.  The combination of "-u -l" will generate logging messages that 
will give you a record of how bogofilter has scored incoming mail and of 
the wordlist updated.  I've found "-u -l" to be very useful as a combination.





More information about the Bogofilter mailing list