Testing fisher
Greg Louis
glouis at dynamicro.on.ca
Mon Jan 27 14:43:44 CET 2003
On 20030127 (Mon) at 0135:50 +0100, Boris 'pi' Piwinger wrote:
> Hi!
>
> I have done some extensive testing of fisher constants (two
> states). My setting was somehow unusual. First I did a
> training with about 15000 hams and 4000 spams. This was
> immediately going into production (using -u and manual
> corrections). With this live data I was testing my training
> database (sic!), which was occasionly enlarged by new ham
> and spam mails. So I was not testing bogofilter on new mail,
> but on known mail. Here are the results:
And very nice too :) They're more or less consistent with what we've
seen in past testing. It's well known that running a training db
against itself will give better results than you obtain with new
messages, but it may be that for checking parameters like min_dev and
the spam cutoff, this is a good method -- you get lots of messages
to test with.
Looks like min_dev of 0.025 and cutoff 0.6 worked quite well for you
with the robx and robs values you employ. That's interesting; I was
doing some tuning yesterday and found that 0.25 worked best for me; a
whole order of magnitude bigger! That underlines the value of doing
one's own tuning from time to time -- I'd been using 0.1 for months and
it had seemed ok.
Anyhow, thanks for the report. We need this kind of information -- as
much of it as we can get -- in order to try to pick defaults that
newbies can use without getting discouraged by bad performance.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
| Help free our mailboxes. Include |
| http://wecanstopspam.org in your signature. |
More information about the Bogofilter
mailing list