Question

Thu May 21 18:17:57 CEST 2009

It will always be a constant battle.  Spammers continually try new 
things to get past the filters, and some of them probably have honeypots 
feeding an instance of bogofilter, so they know what can defeat a 
typical wordlist.  As long as spammers exist, you'll surely get some 
small fraction of false negatives.  Mine are somewhere around 0.1% of my 
email.  Dumping those 5-10 spams/unsures a day into a folder and 
training them is a minor daily task.  Incidentally, my hams are probably 
around 0.5% of my email.  The remaining 99.4% or so is correctly 
filtered by dnsbls, clamav, and bogofilter -- most of them rejected at 
smtp time.

Tom

Stephen Davies wrote:
> OK. I'll try that.
> 
> I guess I should point out that bogofilter is correctly detecting several 
> hundred spams every day. It is only a relatively small number (say 30 per 
> day) that raise this issue.
> I have not had a single case where ham is incorrectly detected as spam in many 
> thousand emails.
> 
> I typically find that a single -Ns run is sufficient to give correct detection 
> but usually do one -Ns followed by four -s.
> 
> Despite that, I am still getting undected spams.
> 
> Cheers,
> Stephen
> 
> On Thursday 21 May 2009 12:49:39 Thomas Anderson wrote:
>> Try increasing your robs.  Mine is 0.22.
>>
>> Other than that, you just have to train on errors.  Your false negatives
>> should decrease with training.  Try training til exhaustion, i.e. train
>> the same email repeatedly until it classifies correctly.  This should
>> prevent you from having to see the same email from many sources before
>> it classifies correctly.
>>
>> Tom
>>
>> Stephen Davies wrote:
>>> I understand.
>>>
>>> My initial issue is with the obvious spams not being detected first time
>>> round.
>>> The first I see of them is in my inbox as ham - despite being so
>>> obviously spam.
>>>
>>> If I save the email and run it through bogofilter -vvv, I get the results
>>> I posted.
>>>
>>> I then use bogofilter -Ns to "fix" the database and this seems to work -
>>> until the next spam with the same pattern but from a different source
>>> arrives. (bogofilter -vvv at this stage gives bogosity of 1.0).
>>>
>>> I have changed my min-dev, robx and robs to 0.35, 0.7, 0.1 but first
>>> indications are that this is not enough.
>>>
>>> On Thursday 21 May 2009 10:56:51 RW wrote:
>>>> On Thu, 21 May 2009 09:49:48 +0930
>>>>
>>>> Stephen Davies <scldad at sdc.com.au> wrote:
>>>>> On Thursday 21 May 2009 06:33:00 Thomas Anderson wrote:
>>>>>> You have to adjust your robx and robs values.  They will determine
>>>>>> where never-before-seen and rarely-seen tokens get scored.  E.g. if
>>>>>> you set your robx within your "unsure" zone, new tokens will never
>>>>>> score as ham or spam.  And with your robs, you can ensure that
>>>>>> tokens seen only a few times also remain less influential.
>>>>> Thanks Tom. I found the doco and that looks like what I need.
>>>> Just to be clear though, these are not "never-before-seen and
>>>> rarely-seen tokens", they are tokens from spams that have been learned
>>>> as ham. If you have a setup where you expect high levels of
>>>> miss-training, then tuning Bogofilter to mitigate this is sensible -
>>>> otherwise I'd want to know why it's happening.
>>>> _______________________________________________
>>>> Bogofilter mailing list
>>>> Bogofilter at bogofilter.org
>>>> http://www.bogofilter.org/mailman/listinfo/bogofilter
>> _______________________________________________
>> Bogofilter mailing list
>> Bogofilter at bogofilter.org
>> http://www.bogofilter.org/mailman/listinfo/bogofilter
> 
> 
>