Modularity
Nick Simicich
njs at scifi.squawk.com
Mon Jan 13 22:10:34 CET 2003
At 11:53 PM 2003-01-12 -0200, Adriano Nagelschmidt Rodrigues wrote:
>Why? I was thinking exactly about the "-u" switch when I listed "bogolearn" as
>a possibility.
99% of my invocations of bogofilter (ok, more than that) are with the -u
option. About 99% of the remaining 1% are -S or -N invocations. I pretty
much reclassify all of my misclassified mail.
>C'mon, modularity is the UNIX way. We like it :-)
In my opinion, the main reason for the extreme modularity that is
traditional in Unix is limited segment size. As I recall, early setups
(like PC/IX, the port of Unix to the PC/XT), had a 64k I segment size and a
64k D segment size. You simply could not run big complex programs in a
segment size like that. Nor was it convenient to compile them with the
machines of the time. Whereas it is likely that I have a copy of PC/IX
around here somewhere, and I may even have an installed copy, it was simply
not that interesting - no support for LAN networking came with it. It
could function as a UUCP node, and support multiple simultaneous logins. I
do not believe that this was the only architecture that limited segment
size so strongly.
Another reason for modularity is to make things simpler, so that they are
more likely to be correct. This program seems reasonably correct at its
current size.
If you split the programs on me, you would exchange one module load for a
shell script and two module loads, and having all of the data move on a
pipe or something, rather than through the current memory transfer. How
much inefficiency do you want to tolerate?
Right now, it is still possible to run this program on light iron, I run it
on a P-90, but it pushes it. Tripling the work (or more) that it takes to
do the job for nothing but purity is bogus if you ask me.
If you want to have three man pages to make the arguments "pure", install
the program with two aliases, and have it act differently depending on the
alias that is called. Then you can write one program with three man pages
and command interpretations. You can have your conceptual simplicity
without sacrificing efficiency.
As someone pointed out, another important reason for modularity is if the
intermediate output is useful to, um, something in general. A wc might be
useful to a program or to a human. This is a case where, if the programs
were piped together, the output from program A would always be passed to
program B and would likely not even be an external interface.
But please do not blindly do this split without doing some performance
studies, including some on small machines. I have been wrong about
performance guesses in the past, but I do not think I am this time.
--
Take The Boulder Pledge Today
"Under no circumstances will I ever purchase anything offered to me as the
result of an unsolicited e-mail message. Nor will I forward chain letters,
petitions, mass mailings, or virus warnings to large numbers of others.
This is my contribution to the survival of the online community." - Roger
Ebert -- nor will I vote for any candidate who solicits my vote via e-mail.
Nick Simicich - njs at scifi.squawk.com
More information about the Bogofilter
mailing list