source code re-organization [was: ballot]

Greg Louis glouis at dynamicro.on.ca
Sat Jan 4 13:07:49 CET 2003


On 20030103 (Fri) at 1752:24 -0500, David Relson wrote:
> At 04:50 PM 1/3/03, Matthias Andree wrote:
> 
> >> 2 - source code re-organization - yay or nay
> >
> >Nay. We might do that at 1.0.0 release time, at which point we'd branch
> >the 1-0-0-stable branch, and reorganize the tree.
> 
> Matthias,
> 
> Since I think the 1-0-0-stable branch is near, I'm willing to wait.

I've voted already (yea to reorg, nay to 1.0) but I'd like to explain
the latter a bit more.  The big 1, to my mind, implies a level of
completeness, stability and general readiness for prime time that
bogofilter is, IMHO, in no shape to claim.  Take gnupg, gnumeric,
mozilla, pan, openssl, ethereal -- I'm sure I could think of many other
examples -- the first three are past the 1.0 stage but were pre-1.0 for
years, and the last three are all still 0.x releases, though each of
the six is much farther along its development path than bogofilter has
come in the not-quite-six-months of its existence.

TODO for 1.0 should include, I'd say:

- agree on testing methodology that we all trust, so we don't see
  people write "I haven't tested it yet" when others have done so
  extensively (this has been a major virtue of the Spambayes project)

- finalize the algorithm choice (I think everyone who's seriously
  evaluated each would agree that Robinson-Fisher is the best available
  at present, though I suspect Robinson-BayesChain might deserve further
  evaluation).  That's not to say we mightn't change it if Gary or
  someone else comes up with an even better scheme, but I'd like to see
  bogofilter officially support just one at a time -- serial monogamy
  if needed, but no more polygamy :)

- agree on what mime parsing we want to do and how it's to be done, and
  do it, and give it time to prove its worth and settle down

(The classifier and the tokenizer are the crucial elements of the
program, obviously.  They need to be right and they need to be stable.)

- develop a sound and sensible HOWTO that explains what the parameters
  (spam cutoff, nonspam cutoff, minimum deviation, s and x) do, how
  they interact, and how to choose values for them.  I think this
  really really matters: we can't claim we're ready for prime time when
  at bottom we don't truly understand what we're doing.  Me, I know in
  theory what they do and a little about how they interact, but there
  is more to be thought through and/or learned before I could claim to
  know how bogofilter's classification really works.  And I doubt that
  I'm unique in my ignorance.  I still see people on the bogofilter
  list doing pure handwaving with x, for example.

That's not by any means a complete list, but I hope it gives the
flavour of why I don't see it as wise to claim 1.0 readiness at this
stage in bogofilter's growth.  If 0.10.0 sounds like regression to
infancy, then I'd propose sticking with 0.9.x, letting x go to 99 or
beyond if need be.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the Bogofilter mailing list