Testing fisher

Mon Jan 27 14:43:44 CET 2003

On 20030127 (Mon) at 0135:50 +0100, Boris 'pi' Piwinger wrote:
> Hi!
> 
> I have done some extensive testing of fisher constants (two
> states). My setting was somehow unusual. First I did a
> training with about 15000 hams and 4000 spams. This was
> immediately going into production (using -u and manual
> corrections). With this live data I was testing my training
> database (sic!), which was occasionly enlarged by new ham
> and spam mails. So I was not testing bogofilter on new mail,
> but on known mail. Here are the results:

And very nice too :)  They're more or less consistent with what we've
seen in past testing.  It's well known that running a training db
against itself will give better results than you obtain with new
messages, but it may be that for checking parameters like min_dev and
the spam cutoff, this is a good method -- you get lots of messages
to test with.

Looks like min_dev of 0.025 and cutoff 0.6 worked quite well for you
with the robx and robs values you employ.  That's interesting; I was
doing some tuning yesterday and found that 0.25 worked best for me; a
whole order of magnitude bigger!  That underlines the value of doing
one's own tuning from time to time -- I'd been using 0.1 for months and
it had seemed ok.

Anyhow, thanks for the report.  We need this kind of information -- as
much of it as we can get -- in order to try to pick defaults that
newbies can use without getting discouraged by bad performance.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |