default parameters - new vs old vs mine

Tom Anderson tanderso at oac-design.com
Wed Mar 31 03:27:07 CEST 2004


On Tue, 2004-03-30 at 19:11, David Relson wrote:
> For those of you who are curious the attached .tgz file contains the 3
> "hs" messages.  The three messages _are_ ham and also get high scores. 
> Below are the message ids and the scores.
> 
> 5599 0.994920 
> 6115 0.996870 
> 6120 0.991023

I'm slightly disappointed... I thought these would have to be rather
interesting to score so highly for you, but alas they are pretty
normal.  My scores are:

5599 0.000000
6115 0.000053
6120 0.000143

I wonder why they score so spammy for you.  They're not even near my
unsure territory.

> Remember that the above scores are "after the fact", i.e. messages have
> been entered in the wordlists and are now being scored.  The scores the
> messages get today are different from the scores they got when they
> arrived because the wordlist is different.

True.  Still.

I don't keep many emails around, but I think maybe I'll scrape together
whatever is saved in my client to test some numbers against... do you
have a simple procedure for running bogofilter on a bunch of emails and
collecting the results?  A script perhaps?

> > Just off-hand, I would suggest decreasing robx and increasing robs to
> > better bias it.  But that's just based on my experience.
> 
> You're free to say that, however I've seen bogotune results that
> contradict that idea.

Again with the bogotune... considering how just about everyone involved
in bogofilter has expressed how they aren't entirely certain exactly how
the various algorithms actually work together and why changing certain
values one way or the other has the effect it does, an aweful lot of
faith is put into bogotune to magically come up with the best numbers. 
Admittedly, I haven't cracked open the source on it to audit the
procedure it uses, but to me this constant reliance on it despite
contradictory claims seems slightly out of place.  Maybe, just maybe,
bogotune doesn't produce the best possible numbers.  Maybe it finds a
local maxima instead of a global one.  Just my opinion, I could be
wrong.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040330/822e1c7b/attachment.sig>


More information about the Bogofilter mailing list