[cvs] Potential for error?
David Relson
relson at osagesoftware.com
Mon Oct 21 18:47:07 CEST 2002
At 12:37 PM 10/21/02, Allison, Thomas wrote:
>oops.
>I was trying to find the one I was subscribed to from home and ended up
>here...
>
>It was an after thought on my way to work based on your message about having
>such a large body of email to work with.
>
>What I was driving at is the idea that early on, each change in a value may
>shift the probability as much as 10%. But in time, this will diminish.
>
>If there is a sudden shift in the type of spam jargon used, will it take a
>very long time for bogofilter to adjust?
>
>I would guess that one method of testing would be to start fresh and train
>with very very old spam and then test it against very new spam and see how
>quickly it can adapt.
Sounds like a good test. Let me suggest the following sequence:
1. Build word lists (save a copy)
2. Run messages (without updating word lists) to establish a base line.
3. Run messages using update mode to measure effect.
4. With fresh word lists (from copy)
5. Run messages with update mode. Any classified as non-spam should be
corrected (bogofilter -S).
6... Repeat #5 (several times)
For steps 2, 3, 5, 6, 7, ... record #correct, #incorrect, percentage correct.
The above will show the effect of additional training as the spam
vocabulary changes.
More information about the Bogofilter
mailing list