Testing fisher

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Mon Jan 27 14:51:12 CET 2003


Greg Louis wrote:

>> I have done some extensive testing of fisher constants (two
>> states). My setting was somehow unusual. First I did a
>> training with about 15000 hams and 4000 spams. This was
>> immediately going into production (using -u and manual
>> corrections). With this live data I was testing my training
>> database (sic!), which was occasionly enlarged by new ham
>> and spam mails. So I was not testing bogofilter on new mail,
>> but on known mail. Here are the results:
> 
> And very nice too :)  They're more or less consistent with what we've
> seen in past testing.  It's well known that running a training db
> against itself will give better results than you obtain with new
> messages,

That is what we would expect. My idea was to see how good
bogofilter had learned.

> but it may be that for checking parameters like min_dev and
> the spam cutoff, this is a good method -- you get lots of messages
> to test with.

This was the second reason for my test. I wanted to find
"the right parameters".

> Looks like min_dev of 0.025 and cutoff 0.6 worked quite well for you

Indeed. As I said in an earlier mail, almost no mail got
over 0.6 if it wasn't spam. Interesting enough I also don't
get false positives in my real use (of course, it is to
early to call). Those values do work great right now.

> with the robx and robs values you employ. 

Since I don't really understand those (haven't looked at the
math), they are the default values.

> That's interesting; I was
> doing some tuning yesterday and found that 0.25 worked best for me;

For min_dev? That is huge!

> Anyhow, thanks for the report.  We need this kind of information -- as
> much of it as we can get -- in order to try to pick defaults that
> newbies can use without getting discouraged by bad performance.

I think, that the cutoff is too large by default (mine would
be too small, though). Maybe min_dev should be positive in
the default.

pi





More information about the Bogofilter mailing list