We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter is devoted to the semantics of the functional language that we described in the previous chapters. The point of its semantics is to define the meaning of expressions in this language; that is, to define precisely the value of each expression. The association between an expression and its value is created by rewrite rules; that is, rules that transform expressions textually. Those rules are presented and discussed in Section 3.1.
These rewrite rules are non-deterministic. That is, in general, for any expression under consideration, there is more than one rule that may be applied to it. The consistency of these rules rests on the fact that they form a convergent system. In other words, whatever the non-deterministic choices made, at every step it is always possible to make any two different computations converge toward the same expression. This property does not exclude the existence of infinite computations, but it does exclude the possibility of an expression having two distinct values. The value of an expression (when it exists), is therefore unique. We assert this convergence property here, but we will not try to prove it. References about the proof of convergence are found at the end of this chapter.
In practice, in order to implement an evaluator for a language, we have to define a strategy that lets us choose a rewrite at every step—we choose one such rewrite among the set of all possible rewrites.
In this chapter, we show you how to represent exact numbers of arbitrary size. In certain applications, our ability to compute with such numbers is indispensible, especially so in computer algebra. Formal systems of symbolic computation, such as Maple [14], Mathematica [44], or Axiom [19], exploit an exact rational arithmetic. Moreover, programming languages oriented toward symbolic computation generally support exact computations. Such is particularly the case of Caml with the libraries bignum and ratio.
The sets of numbers that we will treat here are the natural numbers (also known as integers for counting), the signed integers (that is, both positive and negative), and the rational numbers. The natural numbers will be represented by the sequence of their digits in a given base. The sequence itself can be represented in various ways. We will represent natural numbers primarily by ordinary lists. This choice is not very efficient because it supports traversal in only one direction. If we decide to put least significant digits at the head of the list, then we can multiply and add fairly efficiently, but division will be inefficient because we must then turn the lists around.
Nevertheless, if we represent natural numbers as lists, then we can program the usual operations simply, and that model can serve later as the point of reference for getting into various other representations, such as representations by doubly linked circular lists or by arrays—representations used in “real” implementations.
This book has a number of objectives. First, it provides the concepts and a language to produce sophisticated software. Second, the book tries to make you step back a bit from programming as an activity by highlighting basic problems linked to programming as a discipline. In the end, we hope to share our own pleasure in programming.
The language we use—Caml—makes it possible to achieve all these goals. Caml belongs to the family of “functional” languages, all of which have the following qualities:
They are particularly well adapted to writing applications for “symbolic computation”—the kind of computing that concerns computer scientists as well as mathematicians—in software engineering, artificial intelligence, formal computation, computer-aided proof, and so forth.
Functional languages are built on a fundamental theory that derives from mathematical logic. This basis provides these languages with their semantics as well as their systems of types and proof.
By the very way in which they are designed, these languages support a certain aesthetic in programming, an aesthetic which, like the aesthetic of a mathematical proof, is often an indication of their quality.
This book grew primarily out of a programming course given by Guy Cousineau at the Ecole Normale Supérieure between 1990 and 1995. The book also benefited from the teaching experience of Michel Mauny, who wrote Chapters 8 and 13 and contributed to the overall consistency of the book.
The spectacular development of the computing industry depends largely on progress in two very different areas: hardware and software. Progress in hardware has been fairly quantitative: miniaturized parts, increased performance, cost cutting; whereas the progress in software has been more qualitative: ease of use, friendliness, etc.
In fact, most users see their computer only through interfaces that let them exploit the machine while ignoring practically all its structure and internal details, just as if we drove our cars without ever opening the hood, just like we enjoy the comfort of central heating without necessarily grasping thermodynamics.
This qualitative improvement was brought to us by progress in software as an independent discipline. It is based on a major research effort, in the course of which computer science has been structured little by little around its own concepts and methods. Those concepts and methods, of course, should be the basis for teaching computer science.
The most fundamental concept in computer science is computing, of course. A computation is a set of transformations carried out “mechanically” by means of a finite number of predefined rules. A computation impinges on formalized symbolic data (information) representing, for example, numbers (as in numeric computations) or mathematical expressions (as in formal computation) or data or even knowledge of all kinds. The only characteristics common to all computations is the discreteness of their data (that is, the information is finite) and the mechanical way in which the rules are applied.
This last part of the book describes techniques to implement a language like Caml. We do not pretend to give a complete description here of an implementation of Caml, but rather a demonstration that such an implementation is feasible. We treat a subset of Caml to show the major difficulties in compilation and type synthesis.
Chapter 11 defines a Caml evaluator in Caml. It highlights the main ideas that make it possible to produce a compiler: the idea of an environment is used to manage variables, and the idea of closure is used to represent functional values.
Chapter 12 tackles two topics simultaneously: compilation schema and techniques of memory management that come into play in the implementation of a functional language. With respect to memory management, only allocation is described precisely. Techniques for recovering memory (that is, garbage collection) are only briefly touched.
The set of machine instructions we use occurs at a relatively abstract level compared to all the instructions available in assembly language, but that set can nevertheless be translated into true machine instructions quite directly.
Chapter 13 describes a type synthesizer. We give you a preliminary version of it in a purely functional style; then we move on to a more efficient one, one that uses a destructive variety of the unification algorithm. This version is quite close to the actual type synthesizer in Caml.
This book includes a great many examples. These examples are presented as if they were typed on the keyboard of a computer during an interactive session in Caml. In consequence, they include lines written by the user along with responses from the system. The character # that appears at the beginning of examples is the system prompt. Text written by the user begins after that character and ends with a double semi-colon (;;). Everything between the # and the double semi-colon is thus due to the user. The rest is the system response. This system response includes information about type and value.
In the Caml system, the type of expressions entered by the user is computed statically, that is, computed before evaluation. Any possible inconsistencies in type are detected automatically and reported by an error message. This kind of type-checking is carried out without the user ever having to give any indication to the system about types—no type declarations as in other languages, like Pascal or C.
Once the types have been synthesized satisfactorily, evaluation takes place, and then a result is computed and displayed. This display takes one of two different forms, depending on whether the text entered by the user is a simple expression or a definition.
The examples given here differ a bit in typography from those that actually appear on screen during a real working session; we modified them for legibility and aesthetic reasons.
All those aspects of Caml that cannot be described in a purely functional view of the language are known as its imperative qualities
either because they make sense only with respect to a particular evaluation strategy,
or because they refer to the machine representation of data structures.
Among the imperative aspects of that first kind, there are exceptions and input-output.
Among the second kind of imperative aspects, we find destructive operations such as assignment. The effect of such operations can be explained completely only by reference to formal semantics or to a description of the implementation of data structures. (We will get to those ideas later in Chapter 12.) However, we can still give you a reasonable description here based on examples.
Exceptions
In Section 2.3.4, we touched on the problem of writing partial functions. To do that, we introduced the type
This solution can hardly take into account all the situations where we need partial functions. For example, division is a partial operation (since division by 0 (zero) is not defined), but it would not be practical to replace the types int and float by the types int option and float option in every numeric calculation because doing so assumes that all arithmetic operations foresee the case where one of their arguments is undefined. The chief effect of that assumption would be to make numeric calculations impractical simply because they would be too inefficient to perform!
Programs have to handle highly varied objects organized into categories, such as numbers, text, images, formulas, as well as other programs. These objects are for the most part foreign to the world of programming and involve quite diverse formal systems.
For each such category of objects, the programmer must define a representation in the objects of the programming language that he or she is using. This representation must satisfy criteria such as efficiency (by exploiting what the programmer has learned about algorithms) and clarity; that is, the way the data is structured should reflect the structure of the objects being represented. Representation of objects from the exterior world should not be some obscure encoding; rather, the conceptual, external structure should remain apparent.
In a typed language like Caml, the nature of an object lies in its type. It is thus natural to ask that the external structure of the objects being handled should be reflected in the types of their internal representation. For that purpose, the programmer must have the means to define a great variety of types.
In this chapter, we present the two principal type constructions that Caml offers: records and sums with constructors. You might think of a record as the product of named components, and a sum as the union of named components.
Record or Named Cartesian Products
Let's assume that we want to write programs working on complex numbers. We can imagine representing complex numbers by pairs (r,i) of type (float * float) giving the real and imaginary parts of such numbers.
This chapter is devoted to functionally programming geometric drawings. We use types and graphic functions from the library MLgraph, and we briefly describe them before we use them.
The graphic approach of MLgraph is the same as PostScript, a language used particularly for programming laser printers. This approach is oriented toward programming drawings meant to illustrate documents, rather than toward the display of drawings on screen, though there is software to display PostScript drawings as well. Such software provides a reasonable approximation of the drawings, but the quality is limited by the actual screen resolution.
PostScript is an imperative language, indeed, a low-level imperative language, which was not meant to be used directly but rather to be generated by software to produce drawings or to type-set text. The approach we present here uses the graphic model of PostScript, but replaces that language by Caml. This enables us to provide much higher level construction functions for drawing.
The graphic model of PostScript freely uses an infinite Cartesian plane with no limit on the coordinates. The basic objects are initially sketched on this plane in absolute coordinates, but a user can then freely apply transformations to modify their size and position. Such transformations rapidly get out of the constraints about positioning and let us construct complicated drawings by functionally combining simpler ones.
At first glance, producing drawings by programming seems less pleasant than using software to draw interactively, especially if we want to produce a particular design rapidly.
Here we begin the second part of this book; it is dedicated to writing various applications in Caml. Its chapters are largely independent of one another, so you can read them separately, choosing them according to your own interests or according to the needs of a course.
Chapter 5 covers symbolic computations over formal terms. In particular, it defines algorithms for pattern matching and unification used in many applications of symbolic computations, such as automatic proof, logic programming, or type synthesis.
Chapter 6 shows how to use balanced trees to represent large bodies of information. It uses the ideas of binary search trees and AVL trees as introduced in every course about algorithms.
Chapter 7 goes deeply into the methods of exploring graphs. It uses techniques of set representation introduced in Chapter 6 and presents applications from the domain of games such as the red donkey or solitaire.
Chapter 8 takes up the writing of lexical and syntactic analyzers. It uses the idea of a Caml stream.
Chapter 9 shows how to program drawing and designs. It first succinctly describes and then exploits the MLgraph library to draw trees and to tile planes and surfaces.
Chapter 10 deals with exact arithmetic, taking into account integers and rational numbers of arbitrary size. It does not attempt to describe a complete and efficient implementation of a library for large numbers, but rather, it tries to convince you that such a project is feasible.
Most programs, especially those that expect a character string as input, begin by recognizing such strings in order to determine whether or not one happens to represent valid input. Next—in fact, at the same time—such a program transforms the string into internal data that is easier to handle. The preceding chapters assumed the existence of such features and functions. This chapter indicates how to program them.
A command to an operating system (such as Is on Unix, etc.) is a very simple example of a program that takes character strings as input and then decides to accept or reject them. A compiler is a more complicated program to take character strings as input; it accepts the concrete syntax of the program to compile. A compiler decides to accept or reject the program to compile; if it accepts, it transforms the program into more easily managed data, such as abstract syntax trees.
In each of these cases, we say that the command shell or the compiler accepts phrases belonging to a certain language; in the case of the command shell, we mean the options and arguments valid for a given command; for the compiler, the programming language to compile. In both cases, they reject any other phrases. In this chapter, we will call a language the set (possibly infinite) of acceptable character strings, and the chief problem that we take up is that of recognizing the phrases belonging to a given language.
This chapter defines a group of functions to manage large-scale data sets. These functions use trees as data structures because they are so well adapted to representing data sets that evolve dynamically, that is, data sets where the size can grow or shrink during computations.
The specific data structure we use is a binary tree. The algorithms we propose make it possible to keep these binary trees balanced, and this property ensures the efficiency of operations we want to carry out.
The main operations for managing a data set are these:
searching for an element in the data set;
adding an element to the data set;
removing an element from the data set.
To search naively for an element in a data set, we can simply sweep sequentially through all the elements of the set, but that takes time proportional to the size of the data set. Its complexity is thus O(n). This technique is the only one we can use when we are managing data sets generically because we do not know any particular properties of the set.
In order to search more efficiently, we have to make hypotheses about the structure of the data set under consideration. For example, if we can assume that the data are ordered, then we can use a binary search as we did in Section 4.4.1 to represent sets by sorted arrays.
In this chapter, we describe a technique to compile a functional language like Caml into machine code. This compilation technique is strongly connected to the evaluation technique developed in the preceding chapter. In particular, the idea of an environment again plays a central role here, and we keep the idea of closure to represent functional values. However, you will see that these two ideas correspond to slightly different objects in a compiler.
To keep the compiler at a sufficiently conceptual level of description, we have opted to use machine code made up of instructions specially adapted to our compilation scheme. These instructions produce operations more complicated than the “real” instructions of an assembly language. Even so, it is clear that these instructions can be expanded into a list of machine instructions producing the same effect. The compilation scheme we present thus leads directly to a real compiler.
To execute the code produced by our compiler, we use a stack to store the intermediate values needed in computations; we also store the addresses of subprograms there, a conventional and time-honored practice. What is special, though, about compiling functional languages are the instructions to allocate memory needed to build structured values, environments, and closures. To explain these instructions and their role in the compilation of functional languages, we must first explain how we use computer memory to represent complex objects.