We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Chapter 2 we saw that a computer performs computation by processing instructions. A computer instruction set must include a variety of features to achieve flexible programmability, including varied arithmetic and logic operations, conditional computation, and application-defined data structures. As a result, the execution of each instruction requires a number of steps: instruction fetch and decode, arithmetic or logic computation, read or write memory, and determination of the next instruction. The instruction set definition is a contract between software and hardware, the fundamental software–hardware interface, that enables software to be portable. After portability, the next critical attribute is performance, so computer hardware is designed to execute instructions as fast as possible.
Memory is a critical part of computing systems. In the organization of computers and the programming model, memory was first separated logically from the computing (CPU) part, and then later physically. This separation of CPU and memory in a structure known as the von Neumann architecture was covered in Chapter 2 and is illustrated in Figure 5.1.
Sequential abstraction has enabled software to manage the complex demands of constructing computing applications, debugging software and hardware, and program composition. However, with the end of Dennard scaling (see Section 3.3.4), we have been unable to create sequential computers with sufficient speed and capacity to meet the needs of ever-larger computing applications. As a result, computer hardware systems were forced to adopt explicit parallelism, both within a single chip (multicore CPUs) and at datacenter scale (supercomputers and cloud computing). In this chapter, we describe this shift to parallelism. In single-chip CPUs, the shift has produced multicore processors with first 2 or 4 cores, but growing rapidly to 64 cores (2020) and beyond. Understanding of multicore chips, parallel building blocks used in even larger parallel computers, provides an invaluable perspective on how to understand and increase performance.
A computer instruction set defines the correct execution of a program as the instructions processed one after another – that is, sequentially (see Chapter 2). This sequential abstraction enables composition of arithmetic operations (add, xor), operations on memory (state), and also grants extraordinary power to branch instructions that compose blocks of instructions conditionally. In this chapter, we explore the central importance of the sequential abstraction for managing the complexity of large-scale software and hardware systems. Subsequently, we consider creative techniques that both preserve the illusion of sequence and allow the processor implementation to increase the speed of program progress. These techniques are known as instruction-level parallelism (ILP), and accelerate program execution by executing instructions in a program in pipelined (overlapped), out-of-order, and even speculative fashion. Understanding ILP provides a perspective on how commercial processors really execute programs – far different from the step-by-step recipe of the sequential abstraction.
This book is for the growing community of scientists and even engineers who use computing and need a scientific understanding of computer architecture – those who view computation as an intellectual multiplier, and consequently are interested in capabilities, scaling, and limits, not mechanisms. That is, the scientific principles behind computer architecture, and how to reason about hardware performance for higher-level ends. With the dramatic rise of both data analytics and artificial intelligence, there has been a rapid growth in interest and progress in data science. There has also been a shift in the center of mass of computer science upward and outward, into a wide variety of sciences (physical, biological, and social), as well as nearly every aspect of society.
The end of Dennard scaling forced a shift to explicit parallelism, and the adoption of multicore parallelism as a vehicle for performance scaling (see Chapter 3, specifically Section 3.3.4). Even with multicore, the continued demand for both higher performance and energy efficiency has driven a growing interest in accelerators. In fact, their use has become so widespread that in many applications effective use of accelerators is a requirement. We discuss why accelerators are attractive, and when they can deliver large performance benefits. Specifically, we discuss both graphics processing units (GPUs) that aspire to be general parallel accelerators, and other emerging focused opportunities, such as machine learning accelerators. We close with broader discussion of where acceleration is most effective, and where it is not. Software architects designing applications will find this perspective on benefits and challenges of acceleration essential. These criteria will shape both design and evolution, as well as use of customized accelerator architectures in the future.
What is computable? There is a practical answer to that question that is defined by the processors and associated memory hierarchies that we have discussed. This state-of-the-art varies over time with the progress of computer architecture and computing technology (as covered in Chapter 3). We refer to this level of computing performance as a general purpose computer. There is also a theoretical answer to that question, which we will address in Section 6.4.
In this chapter we review the major dimensions of computer architecture covered, summarizing the high points and providing an overall perspective. Specifically, we highlight how each has shaped computers and computing. Computer architecture continues to advance, so we discuss its ongoing evolution, including the major technology and architecture trends. In many cases, the promise of computing is great, but as with parallelism and accelerators, increasingly its progress comes with compromises. We highlight the critical emerging constraints. Outlining these provides a strategic perspective on likely vectors of change that form a roadmap for the future.
Each computer can perform a set of instructions (basic operations) to move and transform data. To support software, which evolves at a different pace, instructions are a critical interface for compatibility. For the hardware, the instruction set is the specification that must be implemented correctly, and as fast and cheaply as possible. To illustrate these concepts and give a practical understanding, we describe the elements of an instruction set using an emerging open-source instruction set, the RISC-V. This concrete example illustrates how an instruction set supports basic software constructs, and the challenges of implementation.