New user and question

Doug dsc3507 at yahoo.com
Thu Oct 28 03:17:50 CEST 2010


Thanks for the comments and suggestions. 

I am seeing a big change as time goes on and I train. I changed the spam/ham 
values and trained both spam and ham from the unsure list and things are working 
much better. Since I am using the -u option in the procmail call bumping the 
spam threshold down reinforces the spam hits. I guess that is the same as 
training manually but it makes it easier. I am getting very few unsure's now and 
no false spam or ham with the exception of things I want to force one way or the 
other. I also use RBS filtering directly in sendmail as well as a fairly 
extensive access list restricting many sites and IP blocks before they even get 
to procmail. Even so I get at least a hundred spams a day.

I turned off Spamassasin as it really was not doing anything or needed with 
bogofilter working the way it is. 


Doug

 

----- Original Message ----

From: RW <rwmaillists at googlemail.com>
To: bogofilter at bogofilter.org
Sent: Mon, October 25, 2010 6:57:19 PM
Subject: Re: New user and question

On Mon, 25 Oct 2010 16:03:32 -0400
Thomas Anderson <tanderson at orderamidchaos.com> wrote:

> On 10/25/2010 12:51 PM, Doug wrote:
> > I am new to Bogofilter. Had been using Spam Assassin for years and
> > wanted to try

You might want to try scoring Bogofilter into SpamAssassin, setting it
for multiple word tokenization, so that it complements SpamAssassin's
Bayes. I find that although Bogofilter (multiple word) and Bayes both FP
occasionally they tend to do it on different mails, so it's legitimate
to have both.

I do something similar (I use DSPAM too), and I find that those that
Bogofilter classifies as unsure usually pick-up enough SA points to get
caught easily.

> > My problem is the unsure's are not going down and the majority of
> > them have Viagra in the subject. It is not obfuscated in any way.
> > I see few if any Viagra emails in the spam mail.  Am I doing
> > something wrong? I have probably feed several hundred or more
> > unsure's like this so far. Is there a way to strongly add a word.
> 
> I recommend training to exhaustion.  That is, when a false positive, 
> false negative, or unsure shows up, first you train it, then you
> check it again as if the same exact email arrived another time, and
> if it still doesn't classify correctly, train it again -- repeat
> until it classifies correctly.

In my my experience that's ineffective with default settings because the
influence of new hapaxes and low-count tokens virtually guarantees
correct identification on the second test - unless you use a very large
value of "robs" that would be unsuitable for normal classification. It
makes more difference if you do it iteratively on corpora. 
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter




More information about the Bogofilter mailing list