source code re-organization [was: ballot]
Greg Louis
glouis at dynamicro.on.ca
Sat Jan 4 13:07:49 CET 2003
On 20030103 (Fri) at 1752:24 -0500, David Relson wrote:
> At 04:50 PM 1/3/03, Matthias Andree wrote:
>
> >> 2 - source code re-organization - yay or nay
> >
> >Nay. We might do that at 1.0.0 release time, at which point we'd branch
> >the 1-0-0-stable branch, and reorganize the tree.
>
> Matthias,
>
> Since I think the 1-0-0-stable branch is near, I'm willing to wait.
I've voted already (yea to reorg, nay to 1.0) but I'd like to explain
the latter a bit more. The big 1, to my mind, implies a level of
completeness, stability and general readiness for prime time that
bogofilter is, IMHO, in no shape to claim. Take gnupg, gnumeric,
mozilla, pan, openssl, ethereal -- I'm sure I could think of many other
examples -- the first three are past the 1.0 stage but were pre-1.0 for
years, and the last three are all still 0.x releases, though each of
the six is much farther along its development path than bogofilter has
come in the not-quite-six-months of its existence.
TODO for 1.0 should include, I'd say:
- agree on testing methodology that we all trust, so we don't see
people write "I haven't tested it yet" when others have done so
extensively (this has been a major virtue of the Spambayes project)
- finalize the algorithm choice (I think everyone who's seriously
evaluated each would agree that Robinson-Fisher is the best available
at present, though I suspect Robinson-BayesChain might deserve further
evaluation). That's not to say we mightn't change it if Gary or
someone else comes up with an even better scheme, but I'd like to see
bogofilter officially support just one at a time -- serial monogamy
if needed, but no more polygamy :)
- agree on what mime parsing we want to do and how it's to be done, and
do it, and give it time to prove its worth and settle down
(The classifier and the tokenizer are the crucial elements of the
program, obviously. They need to be right and they need to be stable.)
- develop a sound and sensible HOWTO that explains what the parameters
(spam cutoff, nonspam cutoff, minimum deviation, s and x) do, how
they interact, and how to choose values for them. I think this
really really matters: we can't claim we're ready for prime time when
at bottom we don't truly understand what we're doing. Me, I know in
theory what they do and a little about how they interact, but there
is more to be thought through and/or learned before I could claim to
know how bogofilter's classification really works. And I doubt that
I'm unique in my ignorance. I still see people on the bogofilter
list doing pure handwaving with x, for example.
That's not by any means a complete list, but I hope it gives the
flavour of why I don't see it as wise to claim 1.0 readiness at this
stage in bogofilter's growth. If 0.10.0 sounds like regression to
infancy, then I'd propose sticking with 0.9.x, letting x go to 99 or
beyond if need be.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
More information about the Bogofilter
mailing list