Question

Stephen Davies scldad at sdc.com.au
Thu May 21 05:44:48 CEST 2009


OK. I'll try that.

I guess I should point out that bogofilter is correctly detecting several 
hundred spams every day. It is only a relatively small number (say 30 per 
day) that raise this issue.
I have not had a single case where ham is incorrectly detected as spam in many 
thousand emails.

I typically find that a single -Ns run is sufficient to give correct detection 
but usually do one -Ns followed by four -s.

Despite that, I am still getting undected spams.

Cheers,
Stephen

On Thursday 21 May 2009 12:49:39 Thomas Anderson wrote:
> Try increasing your robs.  Mine is 0.22.
>
> Other than that, you just have to train on errors.  Your false negatives
> should decrease with training.  Try training til exhaustion, i.e. train
> the same email repeatedly until it classifies correctly.  This should
> prevent you from having to see the same email from many sources before
> it classifies correctly.
>
> Tom
>
> Stephen Davies wrote:
> > I understand.
> >
> > My initial issue is with the obvious spams not being detected first time
> > round.
> > The first I see of them is in my inbox as ham - despite being so
> > obviously spam.
> >
> > If I save the email and run it through bogofilter -vvv, I get the results
> > I posted.
> >
> > I then use bogofilter -Ns to "fix" the database and this seems to work -
> > until the next spam with the same pattern but from a different source
> > arrives. (bogofilter -vvv at this stage gives bogosity of 1.0).
> >
> > I have changed my min-dev, robx and robs to 0.35, 0.7, 0.1 but first
> > indications are that this is not enough.
> >
> > On Thursday 21 May 2009 10:56:51 RW wrote:
> >> On Thu, 21 May 2009 09:49:48 +0930
> >>
> >> Stephen Davies <scldad at sdc.com.au> wrote:
> >>> On Thursday 21 May 2009 06:33:00 Thomas Anderson wrote:
> >>>> You have to adjust your robx and robs values.  They will determine
> >>>> where never-before-seen and rarely-seen tokens get scored.  E.g. if
> >>>> you set your robx within your "unsure" zone, new tokens will never
> >>>> score as ham or spam.  And with your robs, you can ensure that
> >>>> tokens seen only a few times also remain less influential.
> >>>
> >>> Thanks Tom. I found the doco and that looks like what I need.
> >>
> >> Just to be clear though, these are not "never-before-seen and
> >> rarely-seen tokens", they are tokens from spams that have been learned
> >> as ham. If you have a setup where you expect high levels of
> >> miss-training, then tuning Bogofilter to mitigate this is sensible -
> >> otherwise I'd want to know why it's happening.
> >> _______________________________________________
> >> Bogofilter mailing list
> >> Bogofilter at bogofilter.org
> >> http://www.bogofilter.org/mailman/listinfo/bogofilter
>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter



-- 
=============================================================================
Stephen Davies Consulting P/L                             Voice: 08-8177 1595
Adelaide, South Australia.                                Fax  : 08-8177 0133
Computing & Network solutions.                            Mobile:040 304 0583
                                          VoIP:sip:1132210 at sip1.bbpglobal.com



More information about the Bogofilter mailing list