We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
dom-i-nate: to exert the supreme determining or guiding influence on
Webster's Dictionary
Many dataflow analyses need to find the use-sites of each defined variable or the definition-sites of each variable used in an expression. The def-use chain is a data structure that makes this efficient: for each statement in the flow graph, the compiler can keep a list of pointers to all the use sites of variables defined there, and a list of pointers to all definition sites of the variables used there. In this way the compiler can hop quickly from use to definition to use to definition.
An improvement on the idea of def-use chains is static single-assignment form, or SSA form, an intermediate representation in which each variable has only one definition in the program text. The one (static) definition-site may be in a loop that is executed many (dynamic) times, thus the name static singleassignment form instead of single-assignment form (in which variables are never redefined at all).
The SSA form is useful for several reasons:
Dataflow analysis and optimization algorithms can be made simpler when each variable has only one definition.
If a variable has N uses and M definitions (which occupy about N + M instructions in a program), it takes space (and time) proportional to N · M to represent def-use chains – a quadratic blowup (see Exercise 19.8). For almost all realistic programs, the size of the SSA form is linear in the size of the original program (but see Exercise 19.9).
Heap-allocated records that are not reachable by any chain of pointers from program variables are garbage. The memory occupied by garbage should be reclaimed for use in allocating new records. This process is called garbage collection, and is performed not by the compiler but by the runtime system (the support programs linked with the compiled code).
Ideally, we would say that any record that is not dynamically live (will not be used in the future of the computation) is garbage. But, as Section 10.1 explains, it is not always possible to know whether a variable is live. So we will use a conservative approximation: we will require the compiler to guarantee that any live record is reachable; we will ask the compiler to minimize the number of reachable records that are not live; and we will preserve all reachable records, even if some of them might not be live.
Figure 13.1 shows a Tiger program ready to undergo garbage collection (at the point marked garbage-collect here). There are only three program variables in scope: p, q, and r.
MARK-AND-SWEEP COLLECTION
Program variables and heap-allocated records form a directed graph. The variables are roots of this graph. A node n is reachable if there is a path of directed edges r → ··· → n starting at some root r.
A compiler was originally a program that “compiled” subroutines [a link-loader]. When in 1954 the combination “algebraic compiler” came into use, or rather into misuse, the meaning of the term had already shifted into the present one.
Bauer and Eickel [1975]
This book describes techniques, data structures, and algorithms for translating programming languages into executable code. A modern compiler is often organized into many phases, each operating on a different abstract “language.” The chapters of this book follow the organization of a compiler, each covering a successive phase.
To illustrate the issues in compiling real programming languages, I show how to compile Tiger, a simple but nontrivial language of the Algol family, with nested scope and heap-allocated records. Programming exercises in each chapter call for the implementation of the corresponding phase; a student who implements all the phases described in Part I of the book will have a working compiler. Tiger is easily modified to be functional or object-oriented (or both), and exercises in Part II show how to do this. Other chapters in Part II cover advanced techniques in program optimization. Appendix A describes the Tiger language.
The interfaces between modules of the compiler are almost as important as the algorithms inside the modules. To describe the interfaces concretely, it is useful to write them down in a real programming language. This book uses ML – a strict, statically typed functional programming language with modular structure.
ab-stract: disassociated from any specific instance
Webster's Dictionary
A compiler must do more than recognize whether a sentence belongs to the language of a grammar – it must do something useful with that sentence. The semantic actions of a parser can do useful things with the phrases that are parsed.
In a recursive-descent parser, semantic action code is interspersed with the control flow of the parsing actions. In a parser specified in Yacc, semantic actions are fragments of C program code attached to grammar productions.
SEMANTIC ACTIONS
Each terminal and nonterminal may be associated with its own type of semantic value. For example, in a simple calculator using Grammar 3.35, the type associated with exp and INT might be int; the other tokens would not need to carry a value. The type associated with a token must, of course, match the type that the lexer returns with that token.
For a rule A → B C D, the semantic action must return a value whose type is the one associated with the nonterminal A. But it can build this value from the values associated with the matched terminals and nonterminals B, C, D.
RECURSIVE DESCENT
In a recursive-descent parser, the semantic actions are the values returned by parsing functions, or the side effects of those functions, or both.
loop: a series of instructions that is repeated until a terminating condition is reached
Webster's Dictionary
Loops are pervasive in computer programs, and a great proportion of the execution time of a typical program is spent in one loop or another. Hence it is worthwhile devising optimizations to make loops go faster. Intuitively, a loop is a sequence of instructions that ends by jumping back to the beginning. But to be able to optimize loops effectively we will use a more precise definition.
A loop in a control-flow graph is a set of nodes S including a header node h with the following properties:
From any node in S there is a path of directed edges leading to h.
There is a path of directed edges from h to any node in S.
There is no edge from any node outside S to any node in S other than h.
Thus, the dictionary definition (from Webster's) is not the same as the technical definition.
Figure 18.1 shows some loops. A loop entry node is one with some predecessor outside the loop; a loop exit node is one with a successor outside the loop. Figures 18.1c, 18.1d, and 18.1e illustrate that a loop may have multiple exits, but may have only one entry. Figures 18.1e and 18.1f contain nested loops.
loop: a series of instructions that is repeated until a terminating condition is reached
Webster's Dictionary
Loops are pervasive in computer programs, and a great proportion of the execution time of a typical program is spent in one loop or another. Hence it is worthwhile devising optimizations to make loops go faster. Intuitively, a loop is a sequence of instructions that ends by jumping back to the beginning. But to be able to optimize loops effectively we will use a more precise definition.
A loop in a control-flow graph is a set of nodes S including a header node h with the following properties:
From any node in S there is a path of directed edges leading to h.
There is a path of directed edges from h to any node in S.
There is no edge from any node outside S to any node in S other than h.
Thus, the dictionary definition (from Webster's) is not the same as the technical definition.
Figure 18.1 shows some loops. A loop entry node is one with some predecessor outside the loop; a loop exit node is one with a successor outside the loop. Figures 18.1c, 18.1d, and 18.1e illustrate that a loop may have multiple exits, but may have only one entry. Figures 18.1e and 18.1f contain nested loops.
REDUCIBLE FLOW GRAPHS
A reducible flow graph is one in which the dictionary definition of loop corresponds more closely to the technical definition; but let us develop a more precise definition.
sched-ule: a procedural plan that indicates the time and sequence of each operation
Webster's Dictionary
A simple computer can process one instruction at a time. First it fetches the instruction, then decodes it into opcode and operand specifiers, then reads the operands from the register bank (or memory), then performs the arithmetic denoted by the opcode, then writes the result back to the register back (or memory); and then fetches the next instruction.
Modern computers can execute parts of many different instructions at the same time. At the same time the processor is writing results of two instructions back to registers, it may be doing arithmetic for three other instructions, reading operands for two more instructions, decoding four others, and fetching yet another four. Meanwhile, there may be five instructions delayed, awaiting the results of memory-fetches.
Such a processor usually fetches instructions from a single flow of control; it's not that several programs are running in parallel, but the adjacent instructions of a single program are decoded and executed simultaneously. This is called instruction-level parallelism (ILP), and is the basis for much of the astounding advance in processor speed in the last decade of the twentieth century.
A pipelined machine performs the write-back of one instruction in the same cycle as the arithmetic “execute” of the next instruction and the operandread of the previous one, and so on. A very-long-instruction-word (VLIW) issues several instructions in the same processor cycle; the compiler must ensure that they are not data-dependent on each other.
anal-y-sis: an examination of a complex, its elements, and their relations
Webster's Dictionary
An optimizing compiler transforms programs to improve their efficiency without changing their output. There are many transformations that improve efficiency:
Register allocation: Keep two nonoverlapping temporaries in the same register.
Common-subexpression elimimination: If an expression is computed more than once, eliminate one of the computations.
Dead-code elimination: Delete a computation whose result will never be used.
Constant folding: If the operands of an expression are constants, do the computation at compile time.
This is not a complete list of optimizations. In fact, there can never be a complete list.
NO MAGIC BULLET
Computability theory shows that it will always be possible to invent new optimizing transformations.
Let us say that a fully optimizing compiler is one that transforms each program P to a program Opt(P) that is the smallest program with the same input/output behavior as P. We could also imagine optimizing for speed instead of program size, but let us choose size to simplify the discussion.
For any program Q that produces no output and never halts, Opt(Q) is short and easily recognizable:
Therefore, if we had a fully optimizing compiler we could use it to solve the halting problem; to see if there exists an input on which P halts, just see if Opt(P) is the one-line infinite loop. But we know that no computable algorithm can always tell whether programs halt, so a fully optimizing compiler cannot be written either.
ca-non-i-cal: reduced to the simplest or clearest schema possible
Webster's Dictionary
The trees generated by the semantic analysis phase must be translated into assembly or machine language. The operators of the Tree language are chosen carefully to match the capabilities of most machines. However, there are certain aspects of the tree language that do not correspond exactly with machine languages, and some aspects of the Tree language interfere with compiletime optimization analyses.
For example, it's useful to be able to evaluate the subexpressions of an expression in any order. But the subexpressions of Tree.exp can contain side effects – eseq and call nodes that contain assignment statements and perform input/output. If tree expressions did not contain eseq and call nodes, then the order of evaluation would not matter.
Some of the mismatches between Trees and machine-language programs are:
The cjump instruction can jump to either of two labels, but real machines' conditional jump instructions fall through to the next instruction if the condition is false.
eseq nodes within expressions are inconvenient, because they make different orders of evaluating subtrees yield different results.
call nodes within expressions cause the same problem.
call nodes within the argument-expressions of other call nodes will cause problems when trying to put arguments into a fixed set of formal-parameter registers.
Why does the Tree language allow eseq and two-way cjump, if they are so troublesome? Because they make it much more convenient for the Translate (translation to intermediate code) phase of the compiler.
trans-late: to turn into one's own or another language
Webster's Dictionary
The semantic analysis phase of a compiler must translate abstract syntax into abstract machine code. It can do this after type-checking, or at the same time.
Though it is possible to translate directly to real machine code, this hinders portability and modularity. Suppose we want compilers for N different source languages, targeted to M different machines. In principle this is N · M compilers (Figure 7.1a), a large implementation task.
An intermediate representation (IR) is a kind of abstract machine language that can express the target-machine operations without committing to too much machine-specific detail. But it is also independent of the details of the source language. The front end of the compiler does lexical analysis, parsing, semantic analysis, and translation to intermediate representation. The back end does optimization of the intermediate representation and translation to machine language.
A portable compiler translates the source language into IR and then translates the IR into machine language, as illustrated in Figure 7.1b. Now only N front ends and M back ends are required. Such an implementation task is more reasonable.
Even when only one front end and one back end are being built, a good IR can modularize the task, so that the front end is not complicated with machine-specific details, and the back end is not bothered with information specific to one source language.
Chapters 2–11 have described the fundamental components of a good compiler: a front end, which does lexical analysis, parsing, construction of abstract syntax, type-checking, and translation to intermediate code; and a back end, which does instruction selection, dataflow analysis, and register allocation.
What lessons have we learned? I hope that the reader has learned about the algorithms used in different components of a compiler and the interfaces used to connect the components. But the author has also learned quite a bit from the exercise.
My goal was to describe a good compiler that is, to use Einstein's phrase, “as simple as possible – but no simpler.” I will now discuss the thorny issues that arose in designing Tiger and its compiler.
Nested functions. Tiger has nested functions, requiring somemechanism (such as static links) for implementing access to nonlocal variables. But many programming languages in widespread use -C, C++, Java – do not have nested functions or static links. The Tiger compiler would become simpler without nested functions, for then variables would not escape, and the FindEscape phase would be unnecessary. But there are two reasons for explaining how to compile nonlocal variables. First, there are programming languages where nested functions are extremely useful – these are the functional languages described in Chapter 15. And second, escaping variables and the mechanisms necessary to handle them are also found in languages where addresses can be taken (such as C) or with call-by-reference (such as C++).
This book is not about type theories in general but about one very neat and special system called “TA” for “type-assignment”. Its types contain type-variables and arrows but nothing else, and its terms are built by λ-abstraction and application from term-variables and nothing else. Its expressive power is close to that of the system called simple type theory that originated with Alonzo Church.
TA is polymorphic in the sense that a term can have more than one type, indeed an infinite number of types. On the other hand the system has no ∀-types and hence it is weaker than the strong polymorphic theories in current use in logic and programming. However, it lies at the core of nearly every one of them and its properties are so distinctive and even enjoyable that I believe the system is worth isolating and studying on its own. That is the aim of this book. In it I hope to try to pass on to the reader the pleasure the system's properties have given me.
TA is also an excellent training ground for learning the techniques of type-theory as a whole. Its methods and algorithms are not trivial but the main lines of most of them become clear once the basic concepts have been understood. Many ideas that are complicated and tedious to formulate for stronger type-theories, and many complex techniques for analysing structures in these theories, appear in TA in a very clean and neat stripped-down form.