[cvs] Potential for error?

David Relson relson at osagesoftware.com
Mon Oct 21 18:47:07 CEST 2002


At 12:37 PM 10/21/02, Allison, Thomas wrote:

>oops.
>I was trying to find the one I was subscribed to from home and ended up
>here...
>
>It was an after thought on my way to work based on your message about having
>such a large body of email to work with.
>
>What I was driving at is the idea that early on, each change in a value may
>shift the probability as much as 10%.  But in time, this will diminish.
>
>If there is a sudden shift in the type of spam jargon used, will it take a
>very long time for bogofilter to adjust?
>
>I would guess that one method of testing would be to start fresh and train
>with very very old spam and then test it against very new spam and see how
>quickly it can adapt.

Sounds like a good test.  Let me suggest the following sequence:

1.   Build word lists (save a copy)
2.   Run messages (without updating word lists) to establish a base line.
3.   Run messages using update mode to measure effect.

4.   With fresh word lists (from copy)
5.   Run messages with update mode.  Any classified as non-spam should be 
corrected (bogofilter -S).
6... Repeat #5 (several times)

For steps 2, 3, 5, 6, 7, ... record #correct, #incorrect, percentage correct.

The above will show the effect of additional training as the spam 
vocabulary changes.








More information about the Bogofilter mailing list