chi-combining

Greg Louis glouis at dynamicro.on.ca
Sun Nov 17 17:37:52 CET 2002


On 20021116 (Sat) at 1658:20 -0500, Greg Louis wrote:

> > >  Under your original method, the best value
> > > keeps changing as you fiddle other parameters of the system.  If he kept the
> > > same cutoff while fiddling "minimum deviation", and found a decrease in FP
> > > but an increase in FN, the most obvious guess is that he should also have
> > > lowered the cutoff for calling a thing spam (which whould decrease FN and
> > > increase FP).
> 
> I deliberately left the spam_cutoff level static as min_dev varied. 
> It would have been more precise to level out false positives at each
> level of min_dev by adjusting the cutoff, and to compare only false
> negatives after that was done; I thought that the additional
> information gained would probably not justify the labour.  If I'd seen
> really major differences as min_dev varied, I would have done it.

Well, I did it.  The more I thought about it, the more I thought the
experiment wasn't really very easy to interpret without this
refinement.  As of a few minutes ago, there's a major revision of the
paper at http://www.bgl.nu/~glouis/bogofilter/fisher.html with the
cutoff for each run adjusted to produce the same number of false
positives.  Turns out that if you do this, both methods produce pretty
much the same number of false negatives at any level of min_dev (the
threshold difference between f(w) and 0.5, above which a token is
included in the calculations).  The number of false negatives increases
(discrimination power worsens) at min_dev 0.25 and above, which is
consistent with my original observation that mindev=0 was better than
mindev=0.4.  Probable cause is that we're discarding too much
information up there.

Given that we're applying the calculations to the same individual
probability values, I find the congruency of the two calculation
methods somewhat reassuring.  The next step is to try to understand
whether the spambayes folks get better _binary_ results with
chi-squared, or whether the better results are attributable to
chi-squared facilitating the identification of problem cases?  Tim, is
there somewhere I can read up on the testing methods and outcomes that
the spambayes project uses/obtains, without having to spelunk in the
mailing-list archives?

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the Bogofilter mailing list