training on errors only

Sun Dec 1 20:31:26 CET 2002

On 20021201 (Sun) at 1159:55 -0500, Bill Yerazunis wrote:
> 
> I'm looking forward to the LJ article to see how you actually
> implement chi, so I can steal your code. :)

You could pull it out of bogofilter any time, of course ;)

>    > I'd say the last experiment you need to run for the article is a
>    > comparison of feature types.  If my understanding is correct, your
>    > experiments point out that at _best_ BCR can do is 90% accuracy if
>    > every feature in the TOE'd training set is used.

Yes.  I could well have botched something; the writeup is in place now
at http://www.bgl.nu/~glouis/bogofilter/BcrFisher.html and there's a C
code snippet that's supposed to do what crm114 does, slightly
complicated by the inclusion of f(w).  I wasn't expecting so many
errors, but I've pored over the code myself and I can't see anything
wrong with it (other than it's awfully ugly, but it's a throwaway
anyhow).  Maybe you'll spot the flaw, if any?

>    > Can you put CRM114's SBPH/BCR classifier into your script and see how
>    > it does against your test data?  I've put the appropriate command-line
>    > invocations below to run just those pieces of CRM114 functionality.

I've still got to write up the naïve-Bayes experiment I did, and pay
bills, and a few other bits of mundane stuff like that, but I'll take a
look at it someday soon.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |