We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The purpose of this chapter is to introduce the notion of program-size complexity. We do this by giving a smoothed-over story of the evolution of this concept, giving proof sketches instead of formal proofs, starting with program size in LISP. In Chapter 6 we will start over, and give formal definitions and proofs.
Complexity via LISP Expressions
Having gone to the trouble of defining a particularly clean and elegant version of LISP, one in which the definition of LISP in LISP really is equivalent to running the interpreter, let's start using it to prove theorems! The usual approach to program-size complexity is rather abstract, in that no particular programming language is directly visible. Eventually, we shall have to go a little bit in this direction. But we can start with a very straightforward concrete approach, namely to consider the size of a LISP expression measured by the number of characters it has. This will help to build our intuition before we are forced to use a more abstract approach to get stronger theorems. The path we shall follow is similar to that in my first paper [CHAITIN (1966,1969a)], except that there I used Turing machines instead of LISP.
So we shall now study, for any given LISP object, its program-size complexity, which is the size of the smallest program (i.e., S-expression) for calculating it. As for notation, we shall use HLISP (“information content measured using LISP”), usually abbreviated in this chapter by omitting the subscript for LISP. And we write |S| for the size in characters of an S-expression S.
In Part I of this monograph, we do the bulk of the preparatory work that enables us in Part II to exhibit an exponential diophantine equation that encodes the successive bits of the halting probability Ω.
In Chapter 2 we present a method for compiling register machine programs into exponential diophantine equations. In Chapter 3 we present a stripped-down version of pure LISP. And in Chapter 4 we present a register machine interpreter for this LISP, and then compile it into a diophantine equation. The resulting equation, which unfortunately is too large to exhibit here in its entirety, has a solution, and only one, if the binary representation of a LISP expression that halts, i.e., that has a value, is substituted for a distinguished variable in it. It has no solution if the number substituted is the binary representation of a LISP expression without a value.
Having dealt with programming issues, we can then proceed in Part II to theoretical matters.
In this chapter we present a new definition of program-size complexity. H(A, B/C, D) is defined to be the size in bits of the shortest self-delimiting program for calculating strings A and B if one is given a minimal-size self-delimiting program for calculating strings C and D. As is the case in LISP, programs are required to be self-delimiting, but instead of achieving this with balanced parentheses, we merely stipulate that no meaningful program be a prefix of another. Moreover, instead of being given C and D directly, one is given a program for calculating them that is minimal in size. Unlike previous definitions, this one has precisely the formal properties of the entropy concept of information theory.
What train of thought led us to this definition? Following [CHAITIN (1970a)], think of a computer as decoding equipment at the receiving end of a noiseless binary communications channel. Think of its programs as code words, and of the result of the computation as the decoded message. Then it is natural to require that the programs/code words form what is called a “prefix-free set,” so that successive messages sent across the channel (e.g. subroutines) can be separated. Prefix-free sets are well understood; they are governed by the Kraft inequality, which therefore plays an important role in this chapter.
One is thus led to define the relative complexity H(A, B/C, D) of A and B with respect to C and D to be the size of the shortest self-delimiting program for producing A and B from C and D. However, this is still not quite right.
In this chapter we present the beautiful work of JONES and MATIJASEVIC (1984), which is the culmination of a half century of development starting with GODEL (1931), and in which the paper of DAVIS, PUTNAM, and ROBINSON (1961) on Hilbert's tenth problem was such a notable milestone. The aim of this work is to encode computations arithmetically. As Gödel showed with his technique of Gödel numbering and primitive recursive functions, the metamathematical assertion that a particular proposition follows by certain rules of inference from a particular set of axioms, can be encoded as an arithmetical or number theoretic proposition. This shows that number theory well deserves its reputation as one of the hardest branches of mathematics, for any formalized mathematical assertion can be encoded as a statement about positive integers. And the work of Davis, Putnam, Robinson, and Matijasevic has shown that any computation can be encoded as a polynomial. The proof of this assertion, which shows that Hilbert's tenth problem is unsolvable, has been simplified over the years, but it is still fairly intricate and involves a certain amount of number theory; for a review see DAVIS, MATIJASEVIC, and ROBINSON (1976).
Formulas for primes: An illustration of the power and importance of these ideas is the fact that a trivial corollary of this work has been the construction of polynomials which generate or represent the set of primes; JONES et al. (1976) have performed the extra work to actually exhibit manageable polynomials having this property.
The aim of this book is to present the strongest possible version of Gödel's incompleteness theorem, using an information-theoretic approach based on the size of computer programs.
One half of the book is concerned with studying Ω, the halting probability of a universal computer if its program is chosen by tossing a coin. The other half of the book is concerned with encoding Ω as an algebraic equation in integers, a so-called exponential diophantine equation.
Gödel's original proof of his incompleteness theorem is essentially the assertion that one cannot always prove that a program will fail to halt. This is equivalent to asking whether it ever produces any output. He then converts this into an arithmetical assertion. Over the years this has been improved; it follows from the work on Hilbert's 10th problem that Godel's theorem is equivalent to the assertion that one cannot always prove that a diophantine equation has no solutions if this is the case.
In our approach to incompleteness, we shall ask whether or not a program produces an infinite amount of output rather than asking whether it produces any; this is equivalent to asking whether or not a diophantine equation has infinitely many solutions instead of asking whether or not it is solvable.
If one asks whether or not a diophantine equation has a solution for N different values of a parameter, the N different answers to this question are not independent; in fact, they are only log2N bits of information.
Having developed the necessary information-theoretic formalism in Chapter 6, and having studied the notion of a random real in Chapter 7, we can now begin to derive incompleteness theorems.
The setup is as follows. The axioms of a formal theory are considered to be encoded as a single finite bit string, the rules of inference are considered to be an algorithm for enumerating the theorems given the axioms, and in general we shall fix the rules of inference and vary the axioms. More formally, the rules of inference F may be considered to be an r.e. set of propositions of the form
“Axioms | –F Theorem”.
The r.e. set of theorems deduced from the axiom A is determined by selecting from the set F the theorems in those propositions which have the axiom A as an antecedent. In general we'll consider the rules of inference F to be fixed and study what happens as we vary the axioms A. By an n-bit theory we shall mean the set of theorems deduced from an n-bit axiom.
Incompleteness Theorems for Lower Bounds on Information Content
Let's start by rederiving within our current formalism an old and very basic result, which states that even though most strings are random, one can never prove that a specific string has this property.
As we saw when we studied randomness, if one produces a bit string s by tossing a coin n times, 99.9% of the time it will be the case that H(s) ≈ n + H(n). In fact, if one lets n go to infinity, with probability one H(s) > n for all but finitely many n (Theorem R5).
In this chapter we convert the definition of LISP in LISP given in Section 3.6 into a register machine program. Then we compile this register machine program into an exponential diophantine equation.
Register Machine Pseudo–Instructions
The first step to program an interpreter for our version of pure LISP is to write subroutines for breaking S-expressions apart (SPLIT) and for putting them back together again (JOIN). The next step is to use SPLIT and JOIN to write routines that push and pop the interpreter stack. Then we can raise the level of discourse by defining register machine pseudo-instructions which are expanded by the assembler into calls to these routines; i.e., we extend register machine language with pseudo-machine instructions which expand into several real machine instructions. Thus we have four “microcode” subroutines: SPLIT, JOIN, PUSH, and POP. SPLIT and JOIN are leaf routines, and PUSH and POP call SPLIT and JOIN.
Figure 9 is a table giving the twelve register machine pseudo-instructions.
Now a few words about register usage; there are only 19 registers! First of all, the S-expression to be evaluated is input in EXPRESSION, and the value of this S-expression is output in VALUE. There are three large permanent data structures used by the interpreter:
the association list ALIST which contains all variable bindings,
the interpreter STACK used for saving and restoring information when the interpreter calls itself, and
the current remaining DEPTH limit on evaluations.
All other registers are either temporary scratch registers used by the interpreter (FUNCTION, ARGUMENTS, VARIABLES, X, and Y), or hidden registers used by the microcode rather than directly by the interpreter.
The programs in this book were run under the VM/CMS time-sharing system on a large IBM 370 mainframe, a 3090 processor. A virtual machine with 4 megabytes of storage was used.
The compiler for converting register machine programs into exponential diophantine equations is a 700-line REXX program. REXX is a very nice and easy to use pattern-matching string processing language implemented by means of a very efficient interpreter.
There are three implementations of our version of pure LISP:
The first is in REXX, and is 350 lines of code. This is the simplest implementation of the LISP interpreter, and it serves as an “executable design document.”
The second is on a simulated register machine. This implementation consists of a 250-line REXX driver that converts M-expressions into S-expressions, remembers function definitions, and does most input and output formating, and a 1000-line 370 Assembler H expression evaluator. The REXX driver wraps each expression in a lambda expression which binds all current definitions, and then hands it to the assembler expression evaluator. The 1000 lines of assembler code includes the register machine simulator, many macro definitions, and the LISP interpreter in register machine language. This is the slowest of the three implementations; its goals are theoretical, but it is fast enough to test and debug.
The third LISP implementation, like the previous one, has a 250-line REXX driver; the real work is done by a 700-line 370 Assembler Hexpression evaluator. This is the high-performance evaluator, and it is amazingly small: less than 8K bytes of 370 machine language code, tables, and buffers, plus a megabyte of storage for the stack, and two megabytes for the heap, so that there is another megabyte left over for the REXX driver. It gets by without a garbage collector: since all information that must be preserved from one evaluation to another (mostly function definitions) is in the form of REXX character strings, the expression evaluator can be reinitialized after each evaluation. Another reason for the simplicity and speed of this interpreter is that our version of pure LISP is "permissive;" error checking and the production of diagnostic messages are usually a substantial portion of an interpreter.
In this chapter we present a “permissive” simplified version of pure LISP designed especially for metamathematical applications. Aside from the rule that an S-expression must have balanced ()'s, the only way that an expression can fail to have a value is by looping forever. This is important because algorithms that simulate other algorithms chosen at random, must be able to run garbage safely.
This version of LISP developed from one originally designed for teaching [CHAITIN (1976a)]. The language was reduced to its essence and made as easy to learn as possible, and was actually used in several university courses. Like APL, this version of LISP is so concise that one can write it as fast as one thinks. This LISP is so simple that an interpreter for it can be coded in three hundred and fifty lines of REXX.
How to read this chapter: This chapter can be quite difficult to understand, especially if one has never programmed in LISP before. The correct approach is to read it several times, and to try to work through all the examples in detail. Initially the material will seem completely incomprehensible, but all of a sudden the pieces will snap together into a coherent whole. Alternatively, one can skim Chapters 3, 4, and 5, which depend heavily on the details of this LISP, and proceed directly to the more theoretical material in Chapter 6, which could be based on Turing machines or any other formalism for computation.
The purpose of Chapters 3 and 4 is to show how easy it is to implement an extremely powerful and theoretically attractive programming language on the abstract register machines that we presented in Chapter 2.
Cambridge LCF proofs are conducted in PPλ, a logic of domain theory. Leaving aside domains — discussed in later chapters — PPλ is a typical natural deduction formulation of first order logic. This chapter introduces formal proof, first order logic, natural deduction, and sequent calculi. The discussion of semantics is brief and informal; the emphasis is how to construct formal proofs. See a logic textbook for a proper account of model theory.
If you seriously intend to construct proofs, memorizing the inference rules is not enough. You must learn the individual characteristics and usage of each rule. Many sample proofs are given; study every line. Work the exercises.
Fundamentals of formal logic
A formal logic or calculus is a game for producing symbolic objects according to given rules. Sometimes the motivation of the rules is vague; with the lambda calculus there ensued a protracted enquiry into the meaning of lambda expressions. But usually the rules are devised with respect to a well-understood meaning or semantics. Too many of us perform plausible derivations using notations that have no precise meaning. Most mathematical theories are interpreted in set theory: each term corresponds to a set; each rule corresponds to a fact about sets. Set theory is taken to be the foundation of everything else. Its axioms are justified by informal but widely accepted intuitions that sets exist, the union of two sets is a set, and so forth.
In this appendix we prove the results concerning the number of S-expressions of a given size that were used in Chapter 5 to show that there are few minimal LISP programs and other results. We have postponed the combinatorial and analytic arguments to here, in order not to interrupt our discussion of program size with material of a rather different mathematical nature. However, the estimates we obtain here of the number of syntactically correct LISP programs of a given size, are absolutely fundamental to a discussion of the basic program-size characteristics of LISP. And if we were to discuss another programming language, estimates of the number of different possible programs and outputs of a given size would also be necessary. In fact, in my first paper on program-size complexity [CHAITIN (1966)], I go through an equivalent discussion of the number of different Turing machine programs with n-states and m-tape symbols, but using quite different methods.
Let us start by stating more precisely what we are studying, and by looking at some examples. Let a be the number of different characters in the alphabet used to form S-expressions, not including the left and right parentheses. In other words, α is the number of atoms, excluding the empty list. In fact α = 126, but let's proceed more generally. We shall study Sn, the number of different S-expressions n characters long that can be formed from these a atoms by grouping them together with parentheses. The only restriction that we need to take into account is that left and right parentheses must balance for the first time precisely at the end of the expression.