Messages to myself fail.
Thomas Anderson
tanderson at orderamidchaos.com
Sun May 8 20:56:57 CEST 2011
I recommend training to exhaustion. That is, create a recursive loop in
your script which tests the current bogosity after training, and then
retrain that message if it still tests as spam. Look at the "train"
subroutine here for an example:
http://orderamidchaos.com/bogofilter/bfproxy. You could retrain your
entire email set again, but there's no need to retest or retrain the
ones which now test correctly, so it's best to add this to the
subroutine which test/trains each individual email. Also, if you
retrained the whole thing by running your script several times, how
would you know when to stop without manually checking the output and
running it again each time? A recursive loop which tests the output
automates how many times to run it.
Tom
On 5/6/2011 10:41 AM, John Culleton wrote:
> On Tuesday 12 April 2011 15:20:01 John Culleton wrote:
>> On Tuesday 12 April 2011 15:13:04 John Culleton wrote:
>>> I send a message to myself using one email address to send and
>>> another to receive. Routinely it gets a bogosoity of 1.0 and gets
>>> sent to the Spam folder. I move it from Spam to Ham and rerun
>>> bogofilter. Then I run a similar or identical test message and
>>> it goes to Spam again.
>>>
>>> When I run my spam filter script I get numbers like these:
>>>
>>> -- Total messages: 6036
>>>
>>> Total ham: 825
>>> Misdetected ham: 220
>>> retrain fail: 8
>>>
>>> Total spam: 5211
>>> Misdetected spam: 280
>>> retrain fail: 14
>>>
>>> Any hints on what I can do?
>>>
>>> John Culleton
>>> Create Book Covers with Scribus:
>>> http://www.booklocker.com/books/4055.html
>>
>> I reran my bogofilter script a couple of times and the numbers are
>> changing:
>> Total messages: 6040
>>
>> Total ham: 825
>> Misdetected ham: 12
>> retrain fail: 7
>>
>> Total spam: 5216
>> Misdetected spam: 19
>> retrain fail: 11
>>
>> Are these good or bad? Should I just keep rerunning my script?
>
> Let me ask again: Does running my script several times in a row yield
> better results than just once a day? the retrain fail number seems to
> go down each time through about 5 iterations of my training script.
More information about the Bogofilter
mailing list