High-performance computation was first realized in the form of SIMD parallelism with the introduction of the Cray and Cyber computers. At first these were single processor machines, but starting with the Cray XMP series, multiprocessor vector processors gained the further advantages of MIMD parallelism. Today, vector processing can be incorporated into the architecture of the CPU chip itself as is the case with the old AltiVec processor used in the MacIntosh.
The UNIX operating system introduced a design for shared memory MIMD parallel programming. The components of the system included multitasking, time slicing, semaphores, and the fork function. If the computer itself had only one CPU, then parallel execution was only apparent, called concurrent execution, nevertheless the C programming language allowed the creation of parallel code. Later multiprocessor machines came on line, and these parallel codes executed in true parallel.
Although these tools continue to be supported by operating systems today, the fork model to parallel programming proved too “expensive” in terms of startup time, memory usage, context switching, and overhead. Threads arose in the search for a better soluton, and resulted in a software revolution. The threads model neatly solves most of the low-level hardware and software implementation issues, leaving the programmer free to concentrate on the the essential logical or synchronization issues of a parallel program design. Today, all popular operating systems support thread style concurrent/parallel processing.
In this chapter we will explore vector and parallel programming in the context of scientific and engineering numerical applications. The threads model and indeed parallel programming in general is most easily implemented on the shared memory multiprocessor architecture.