Spam / ham registration issue
tanderso at oac-design.com
Wed Mar 3 08:20:58 EST 2004
On Wed, 2004-03-03 at 08:07, Tig wrote:
> Thanks heaps for the reply people. My understanding now is: Each word
> in the test case has been registered as spam and ham, so therefore
> balance out and give a neutral result. It does not matter how many
> times a word is registered as spam or ham, just the fact that it has
> been recorded as either or both.
It does matter how many times it is registered. In this case, each "-s"
registration increased the spamicity of each token, but only
fractionally. Look at pi's results with vvv. It would require many
more than 4 registrations to place these tokens outside of your min_dev
range though. If you continually register with "-s", you'll eventually
make these tokens spammy again. Altering your robs value may reduce the
number of times needed.
That being said, I'm struck by the huge effect of the first registration
and the minor effect of the following ones. I would think a 2:1
spam/ham registration ratio would put the scores above 0.5 or that the
1:0 ratio would be lower. A result of the chi-square processing
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mail.bogofilter.org/pipermail/bogofilter/attachments/20040303/073c9c89/attachment.bin
More information about the Bogofilter