We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we explore, in more detail, the software development methodology that is used with CDTs. It is a methodology based on transformation. Many of the transformations that are useful for list programming were already known informally in the Lisp community, and more formally in the APL and functional programming community. The chief contributions of the categorical data type perspective are:
a guarantee that the set of transformation rules is complete (which becomes important for more complex types); and
a style of developing programs that is terse but expressive.
This style has been extensively developed by Bird and Meertens, and by groups at Oxford, Amsterdam, and Eindhoven. A discussion of many of the stylistic and notational issues, and a comparison of the Bird–Meertens approach with Eindhoven quantifier notation, can be found in [17]. Developments in the Bird-Meertens style are an important interest of IFIP Working Group 2.1.
An Integrated Software Development Methodology
A software development methodology must handle specifications that are abstract, large, and complex. The categorical data type approach we have been advocating plays only a limited role in such a methodology because it is restricted (at the moment) to a single data type at a time. Although it is useful for handling the interface to parallel architectures, it is too limited, by itself, to provide the power and flexibility needed for large application development.
We have shown how to build categorical data types for the simple type of concatenation lists. In this chapter we show the data type construction in its most general setting. While there is some overhead to understanding the construction in this more general setting, the generality is needed to build much more complex types. We illustrate this in subsequent chapters by building types such as trees, arrays, and graphs.
More category theory background is assumed in this chapter. Suitable references are [127,158].
Categorical Data Type Construction
The construction of a categorical data type is divided into four stages:
The choice of an underlying category of basic types and computations on them. This is usually the category Type, but other possibilities will certainly be of interest.
The choice of an endofunctor, T, on this underlying category. The functor is chosen so that its effect on the still-hypothetical constructed type is to unpack it into its components. Components are chosen by considering the type signatures of constructors that seem suitable for the desired type. When this endofunctor is polynomial it has a fixed point; and this fixed point is defined to be the constructed type.
The construction of a category of T-algebras, T-AIg, whose objects are algebras of the new type and their algebraic relatives, and whose arrows are homomorphisms on the new type. The constructed type algebra (the free type algebra) is the initial object in this category. The unique arrows from it to other algebras are catamorphisms.
The central theme of this book is that the structure of a computation on a data type reflects the structure of the data type. This is true in two senses:
Any homomorphism on a data type is intimately related to the algebraic structure of its codomain; which can be exploited in the search for programs, and
The evaluation of any homomorphism can follow the structure of its argument; which can be exploited in computing programs.
Structured data types and the homomorphisms on them, called catamorphisms, form a programming model for parallel computation that has many attractive properties.
There is a desperate need for a model of parallel computation that can decouple software from hardware. This decoupling occurs in two dimensions: decoupling the rate of change of parallel hardware (high) from that of parallel software (low, if it is to be economic); and decoupling the variety of parallel hardware from a single, architecture-independent version of the software.
Such a model is hard to find because the requirements are mutually in tension. A model must be opaque enough to hide target architectures and the complexity of parallel execution, while providing a semantic framework that is rich enough to allow software development. At the same time, it must be partly translucent so that the costs of programs can be visible during development, to allow intelligent choices between algorithms.
In this chapter we build a much more complex type, the type of arrays. The construction of arrays as a categorical data type is significantly different and more complex than the constructions seen so far, so this chapter illustrates new aspects of the construction technique.
Arrays are an important type because of the ubiquitous use of Cartesian coordinate systems and the huge edifice of linear algebra built on top of them. Almost all scientific and numeric computations require arrays as central data structures.
While the need for a data type of arrays is undisputed, there has always been some disagreement about exactly what arrays should represent: should the entries of an array all be of the same type (homogeneous) or might they be different (inhomogeneous); should extents be the same in each dimension (rectangular) or might they differ (ragged); are arrays of different sizes members of the same type or of different types. Different programming languages have answered these questions differently.
Arrays appeared early in languages such as Fortran, which had homogeneous, rectangular arrays but was ambivalent about how they should be typed. Arrays had to be declared with their shapes, and shapes of arguments and parameters had to agree (although some of these rules were relaxed later). Fortran even went so far as to reveal the storage allocation of arrays (by columns) at the language level.
In this chapter, we explore the constraints imposed on models by the properties of parallel architectures. We are only concerned, of course, about theoretical properties, because we cannot predict technological properties very far into the future. Recent foundational results, particularly by Valiant [200], show that arbitrary parallel programs can be emulated efficiently on certain classes of parallel architectures, but that inefficiencies are unavoidable on others. Thus a model of parallel computation that expresses arbitrary computations cannot be efficiently implementable over the full range of parallel architecture classes. The difficulty lies primarily in the volume of communication that takes place during computations. Thus we are driven to choose between two quite different approaches to designing models: accepting some inefficiency, or restricting communication in some way.
Parallel Architectures
We consider four architecture classes:
shared-memory MIMD architectures, consisting of processors executing independently, but communicating through a shared memory, visible to them all;
distributed-memory MIMD architectures, consisting of processors executing independently, each with its own memory, and communicating using an interconnection network whose capacity grows as p log p, where p is the number of processors;
distributed-memory MIMD architectures, consisting of processors executing independently, each with its own memory, and communicating using an interconnection network whose capacity grows only linearly with the number of processors (that is, the number of communication links per processor is constant);
SIMD architectures, consisting of a single instruction stream, broadcast to a set of data processors whose memory organisation is either shared or distributed.
So far we have discussed the properties that a model of parallel computation ought to have and have claimed that models built from categorical data types have these properties. In this chapter we show how to build a simple but useful categorical data type, the type of join or concatenation lists, and illustrate its use as a model. We show how such a model satisfies the requirements, although some of the details are postponed to later chapters.
The language we construct for programming with lists is not different from other parallel list languages in major ways in the sense that most of the list operations are familiar maps, reductions, and prefixes. The differences are in the infrastructure that comes from the categorical data type construction: an equational transformation system, a deeper view of what operations on lists are, and a style of program development. When we develop more complex types, the construction suggests new operations that are not obvious from first principles.
For the next few chapters we concentrate on aspects of parallel computation on lists. We describe the categorical data type construction in more detail in Chapter 9 and move on to more complex types. The next few sections explain how to build lists in a categorical setting. They may be skipped by those who are not interested in the construction itself. The results of the construction and its implications are summarised in Section 5.5.
We have already discussed why a set of cost measures is important for a model of parallel computation. In this chapter we develop something stronger, a cost calculus. A cost calculus integrates cost information with equational rules, so that it becomes possible to decide the direction in which an equational substitution is cost-reducing. Unfortunately, a perfect cost calculus is not possible for any parallel programming system, so some compromises are necessary. It turns out that the simplicity of the mapping problem for lists, thanks to the standard topology, is just enough to permit a workable solution.
Cost Systems and Their Properties
Ways of measuring the cost of a partially developed program are critical to making informed decisions during the development. An ideal cost system has the following two properties:
It is compositional, so that the cost of a program depends in some straightforward way on the cost of its pieces. This is a difficult requirement in a parallel setting since it amounts to saying that the cost of a program piece depends only on its internal structure and behaviour and not on its context. However, parallel operations have to be concerned about the external properties of how their arguments and results are mapped to processors since there are costs associated with rearranging them. So, for parallel computing, contexts are critically important.
It is related to the calculational transformation system, so that the cost of a transformation can be associated with its rule.
In this chapter we define the categorical data types of graphs. Graphs are ubiquitous in computation, but they are subtly difficult to work with. This is partly because there are many divergent representations for graphs and it is hard to see past the representations to the essential properties of the data type.
We follow the now familiar strategy of defining constructors and building graphs as the limit of the resulting polynomial functor.
Graphs have a long history as data structures. Several specialised graph languages have been built (see, for example, [69]), but they all manipulate graphs using operations that alter single vertices and edges, rather than the monolithic operations we have been advocating.
An important approach to manipulating graphs is graph grammars. Graph grammars [71] are analogues of grammars for formal languages and build graphs by giving a set of production rules. Each left hand side denotes a graph template, while each right hand side denotes a replacement. The application of a rule occurs on a subgraph that matches the left hand side of some rule. The matching subgraph is removed from the graph (really a graph form) and replaced by the right hand side of the rule. Various different conventions are used to add edges connecting the new graph to the rest of the original graph. Graph grammars are primarily used for capturing structural information about the construction and transformation of graphs. They do not directly give rise to computations on graphs.
The following concerns are addressed in this concluding chapter:
An assessment of the development of MUSE and MUSE*/JSD. For instance, have appropriate case-studies and tests been used to support the development and demonstration of the methods?
An assessment of the methodological characteristics of MUSE and MUSE*/JSD. For instance, is their scope of human factors design appropriate? Are requirements identified in Chapters One and Two satisfied by MUSE and MUSE*/JSD, and have they any limitations?
A review of potential developments of MUSE and MUSE*/JSD. For instance, how could the methods be enhanced with respect to (a), (b) and (c) above? What computerbased tools could be developed to support the methods; and should declarative human factors knowledge be collated and integrated with them to facilitate method application at each stage of system development?
These concerns are discussed in turn in the sub-sections that follow.
An Overview and Assessment of Method Development Activities of MUSE and MUSE*/JSD
Generally, activities for developing the methods were implemented as planned, e.g. literature surveys; specification and test of method conceptions; case-study selection, planning and familiarisation; etc. An assessment of how key concerns of method development were addressed is discussed below:
Case-study selection. It was clear that the number of case-studies undertaken specifically to develop and test the methods would be limited by the resources available. Thus, considerable care was devoted to the planning and selection of appropriate casestudies for developing and testing the methods.
The last thing one knows in constructing a work is what to put first.
Blaise Pascal, 1909, Pensées
The meaning of things lies not in the things themselves but in our attitude towards them.
Antoine de Saint-Exupéry
Having developed a structured method that supports human factors specification at each stage of system development (namely MUSE), its explicit integration with similarly structured software engineering methods may be considered. In this way, the problems associated with the ‘too-little-too-late’ contribution of human factors to system development may be addressed more completely (see Chapter One). To this end, the following concerns of methodological integration are discussed in this chapter:
(a) A conception of what constitutes an integration of structured human factors and software engineering methods. The requirements to be satisfied by the integrated method are thus defined.
(b) The pre-requisites and issues to be addressed during the integration of structured human factors and software engineering methods.
The above concerns are reviewed generally, followed by an illustration of how they have been addressed in the integration of MUSE (the structured human factors method) with the Jackson Systems Development (JSD) method (a structured software engineering method). For completeness and to provide a contrast with the latter work, other integrations of human factors with structured software engineering methods (work undertaken elsewhere) are also reviewed. Three structured software engineering methods are covered in the latter review, namely the Jackson System Development (JSD) method; the Structured Systems Analysis and Design Method (SSADM); and the Structured Analysis and Structured Design (SASD) Method.
To be still searching what we know not by what we know…….
Milton, 1644, Areopagitica
Leaving the old, both worlds at once they view, That stand upon the threshold of the new.
Edmund Waller, 1606–1687
In this chapter, the stages of the Design Synthesis Phase of the method, namely the Statement of User Needs Stage, the Composite Task Model Stage, and the System and User Task Model Stage, will be described in the order by which design is advanced. The account includes the design products derived to support target system specification using the method. Using the format outlined in Chapter Three, design activities of each of the stages are described in terms of sub-processes that transform its inputs into a number of products. As in Chapter Four, case-study examples are used to illustrate the products.
The Statement of User Needs (SUN) Stage
The Statement of User Needs Stage summarizes the conclusions of extant systems analysis and defines user requirements for the target system. Thus, the information collated would include a mixture of the following:
(a) existing user needs and problems;
(b) existing design requirements, rationale and constraints;
(c) rationale underlying extant design features to be ported to the target system;
(d) performance criteria and domain semantics for the target system.
The primary purpose of the products derived at this stage is to establish constraints to support later design decisions and extensions, e.g. during the synthesis of task models at the Composite Task Model Stage.
Figure 5-1 shows the location of the Statement of User Needs Stage relative to other stages of the method (the stage is indicated by a box outlined in bold).