default parameters - new vs old vs mine

Tue Mar 30 15:21:37 CEST 2004

On Tue, 30 Mar 2004 08:16:35 +0200
Boris 'pi' Piwinger wrote:

...[snip]...

> Very interesting results. AFAIU you have trained with all
> those messages. So it would be interesting to build a
> database with the first 90% of the messages and test with
> the rest.
> 
> Also it would be nice to see how it works if you don't allow
> unsures (isn't that what the default is?). Would you still
> choose the same spam_cutoff?
> 
> pi

pi,

As you guessed, the wordlist contained all the testing messages (except
for the use of thresh_update=0.01 since 14 Jan 2004).  I'll be doing
another test using a wordlist built from 2% of the test corpora and then
scoring 90% of the corpora.  This will give comparison numbers for a
less complete wordlist.  The results of the additional test will happen
sometime after I get home from work tonight.

Since I score using the tri-state (Spam/Ham/Unsure) methods, that's how
I report the results. You can add the unsures to the ham counts if you
wish to look at two-state results.  

David