Suggested default parameter values, was Re: Hapax survival over time

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu Mar 25 17:30:16 CET 2004


Tom Anderson <tanderso at oac-design.com> wrote:

>> robx=0.52
>> min_dev=0.375
>> robs=0.0178
>> spam_cutoff=0.99
>> ham_cutoff=0.45 (or 0 if one prefers binary evaluation)
>> 
>> and I am suggesting we make those the new defaults in the bogofilter
>> distribution.  People might like to try them (adjusting the spam cutoff
>
>I don't see anything glaringly dangerous in those numbers.  The cutoffs
>are very conservative from my experience though... I would expect a
>fairly large number of unsures and false negatives with those values.  

That seems to be also my main concern here. Spams are marked
with a very high certainty, hams on the other hand seem to
be in great danger of lots of false negatives.

That again puts my doubt about the concept of unsures into
play. Why would I want to have those different classes of
errors to correct? I need to correct false negatives as I
need to correct unsures. Both categories will like have a
lot in them. You cant trust the ham or the unsure label in
any sense. Not sure if that helps beginners.

>Your cutoffs might be good for a brand new database, 

As Greg points out the parameters come from the opposite,
extremely large collections. Well, people having those don't
need default parameters. But will those parameters suit
starters? I don't know.

pi




More information about the Bogofilter mailing list