Modularity

Nick Simicich njs at scifi.squawk.com
Mon Jan 13 22:10:34 CET 2003


At 11:53 PM 2003-01-12 -0200, Adriano Nagelschmidt Rodrigues wrote:

>Why? I was thinking exactly about the "-u" switch when I listed "bogolearn" as
>a possibility.

99% of my invocations of bogofilter (ok, more than that) are with the -u 
option.  About 99% of the remaining 1% are -S or -N invocations.  I pretty 
much reclassify all of my misclassified mail.

>C'mon, modularity is the UNIX way. We like it :-)

In my opinion, the main reason for the extreme modularity that is 
traditional in Unix is limited segment size.  As I recall, early setups 
(like PC/IX, the port of Unix to the PC/XT), had a 64k I segment size and a 
64k D segment size.  You simply could not run big complex programs in a 
segment size like that. Nor was it convenient to compile them with the 
machines of the time.  Whereas it is likely that I have a copy of PC/IX 
around here somewhere, and I may even have an installed copy, it was simply 
not that interesting - no support for LAN networking came with it.  It 
could function as a UUCP node, and support multiple simultaneous logins.  I 
do not believe that this was the only architecture that limited segment 
size so strongly.

Another reason for modularity is to make things simpler, so that they are 
more likely to be correct.  This program seems reasonably correct at its 
current size.

If you split the programs on me, you would exchange one module load for a 
shell script and two module loads, and having all of the data move on a 
pipe or something, rather than through the current memory transfer.  How 
much inefficiency do you want to tolerate?

Right now, it is still possible to run this program on light iron, I run it 
on a P-90, but it pushes it.  Tripling the work (or more) that it takes to 
do the job for nothing but purity is bogus if you ask me.

If you want to have three man pages to make the arguments "pure", install 
the program with two aliases, and have it act differently depending on the 
alias that is called.  Then you can write one program with three man pages 
and command interpretations. You can have your conceptual simplicity 
without sacrificing efficiency.

As someone pointed out, another important reason for modularity is if the 
intermediate output is useful to, um, something in general.  A wc might be 
useful to a program or to a human.  This is a case where, if the programs 
were piped together, the output from program A would always be passed to 
program B and would likely not even be an external interface.

But please do not blindly do this split without doing some performance 
studies, including some on small machines.  I have been wrong about 
performance guesses in the past, but I do not think I am this time.

--
Take The Boulder Pledge Today
"Under no circumstances will I ever purchase anything offered to me as the 
result of an unsolicited e-mail message. Nor will I forward chain letters, 
petitions, mass mailings, or virus warnings to large numbers of others. 
This is my contribution to the survival of the online community."  - Roger 
Ebert -- nor will I vote for any candidate who solicits my vote via e-mail.
Nick Simicich - njs at scifi.squawk.com



More information about the Bogofilter mailing list