Question
Stephen Davies
scldad at sdc.com.au
Thu May 21 05:44:48 CEST 2009
OK. I'll try that.
I guess I should point out that bogofilter is correctly detecting several
hundred spams every day. It is only a relatively small number (say 30 per
day) that raise this issue.
I have not had a single case where ham is incorrectly detected as spam in many
thousand emails.
I typically find that a single -Ns run is sufficient to give correct detection
but usually do one -Ns followed by four -s.
Despite that, I am still getting undected spams.
Cheers,
Stephen
On Thursday 21 May 2009 12:49:39 Thomas Anderson wrote:
> Try increasing your robs. Mine is 0.22.
>
> Other than that, you just have to train on errors. Your false negatives
> should decrease with training. Try training til exhaustion, i.e. train
> the same email repeatedly until it classifies correctly. This should
> prevent you from having to see the same email from many sources before
> it classifies correctly.
>
> Tom
>
> Stephen Davies wrote:
> > I understand.
> >
> > My initial issue is with the obvious spams not being detected first time
> > round.
> > The first I see of them is in my inbox as ham - despite being so
> > obviously spam.
> >
> > If I save the email and run it through bogofilter -vvv, I get the results
> > I posted.
> >
> > I then use bogofilter -Ns to "fix" the database and this seems to work -
> > until the next spam with the same pattern but from a different source
> > arrives. (bogofilter -vvv at this stage gives bogosity of 1.0).
> >
> > I have changed my min-dev, robx and robs to 0.35, 0.7, 0.1 but first
> > indications are that this is not enough.
> >
> > On Thursday 21 May 2009 10:56:51 RW wrote:
> >> On Thu, 21 May 2009 09:49:48 +0930
> >>
> >> Stephen Davies <scldad at sdc.com.au> wrote:
> >>> On Thursday 21 May 2009 06:33:00 Thomas Anderson wrote:
> >>>> You have to adjust your robx and robs values. They will determine
> >>>> where never-before-seen and rarely-seen tokens get scored. E.g. if
> >>>> you set your robx within your "unsure" zone, new tokens will never
> >>>> score as ham or spam. And with your robs, you can ensure that
> >>>> tokens seen only a few times also remain less influential.
> >>>
> >>> Thanks Tom. I found the doco and that looks like what I need.
> >>
> >> Just to be clear though, these are not "never-before-seen and
> >> rarely-seen tokens", they are tokens from spams that have been learned
> >> as ham. If you have a setup where you expect high levels of
> >> miss-training, then tuning Bogofilter to mitigate this is sensible -
> >> otherwise I'd want to know why it's happening.
> >> _______________________________________________
> >> Bogofilter mailing list
> >> Bogofilter at bogofilter.org
> >> http://www.bogofilter.org/mailman/listinfo/bogofilter
>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
--
=============================================================================
Stephen Davies Consulting P/L Voice: 08-8177 1595
Adelaide, South Australia. Fax : 08-8177 0133
Computing & Network solutions. Mobile:040 304 0583
VoIP:sip:1132210 at sip1.bbpglobal.com
More information about the Bogofilter
mailing list