Training question
Thomas Anderson
tanderson at orderamidchaos.com
Thu May 14 21:11:54 CEST 2009
It sounds as though your method is fine, but you just need to follow it
to its logical conclusion... train with "Ns" until the token is
recognized as spammy. I do this using bfproxy with the "Nsx" options...
it automatically keeps registering until the emails are correctly
classified or until it has tried a user-defined number of times.
http://orderamidchaos.com/bogofilter/bfproxy
E.g. I just registered one of these Acai spams this way, and here was
the output (with the "v" option):
subject: Treats blood pressure right!
original spamicity: 0.055994
user classification: spam
command: bogofilter -Ns
words: 88
new spamicity: 0.121036
new spamicity: 0.939520
It registered the spam the first time and tested it again to find that
it was still in the hammy range. Therefore, it registered it again,
this time pushing it well into the spammy range, so it stopped at that
point. My "rmax" limit is 50, so if it wasn't making any headway, it
would stop after 50 times to prevent an infinite loop. Rarely does it
need to repeat more than a few times though.
Since I started doing this, I no longer have the problem of having to
receive and correct similar spams many times. If I know that something
is a spam, I want bogofilter to recognize it as such the very first time
I see it. This exhaustive training method ensures that it does.
Tom
Stephen Davies wrote:
> The "good" numbers came from a period of a couple of days when my -Ns proc was
> broken and, as I asked, I don't know how to get rid of them.
>
> I do not use -u at all.
>
> I "retrain" by running each undetected spam through bogofilter -Ns once and
> then through bogofilter -s five times. I would expect - and the -w numbers
> seem to confirm - that this stacks the stats against these texts.
>
> Why does this not work?
>
> Stephen
>
> On Monday 11 May 2009 19:01:34 Matthias Andree wrote:
>> Am 11.05.2009, 07:15 Uhr, schrieb Stephen Davies <scldad at sdc.com.au>:
>>> One of the very common types of spam recently is weight loss by taking
>>> Acai
>>> berries.
>>>
>>> I have received thousands of spams with this in the subject and/or body
>>> and
>>> have fed then all into bogofilter as spam (after first reversing the
>>> initial
>>> ham entry).
>>>
>>> My word list now includes:
>>> spam good
>>> Acai 16084 321
>>> spam good
>>> subj:Acai 5464 352
>>>
>>>
>>> Despite this, I still see:
>>> -bash-3.2# bogofilter -vvv < spam1 | grep Acai
>>> "subj:Acai" 5816 0.029983 0.015939 0.347094 -
>>> "Acai" 16406 0.027416 0.046919 0.631186 -
>>>
>>> What do I have to do to get these (and similar) words recognised as
>>> definitely
>>> spam?
>> How come that >300 of these have been scored as good?
>>
>> If you are using bogofilter with "-u", be sure to THOROUGHLY retrain all
>> unsures and mis-classified messages. If you cannot or do not want to do
>> that, do not run bogofilter in "-u" mode.
>>
>> HTH
>
>
>
More information about the Bogofilter
mailing list