Book contents
- Frontmatter
- Contents
- Preface
- List of abbreviations
- Part I Computing platforms
- Part II Cloud platforms
- Part III Cloud technologies
- Part IV Cloud development
- Chapter 10 Data in the cloud
- Chapter 11 MapReduce and extensions
- Chapter 12 Dev 2.0 platforms
- Part V Software architecture
- Part VI Enterprise cloud computing
- References
- Index
Chapter 11 - MapReduce and extensions
Published online by Cambridge University Press: 06 December 2010
- Frontmatter
- Contents
- Preface
- List of abbreviations
- Part I Computing platforms
- Part II Cloud platforms
- Part III Cloud technologies
- Part IV Cloud development
- Chapter 10 Data in the cloud
- Chapter 11 MapReduce and extensions
- Chapter 12 Dev 2.0 platforms
- Part V Software architecture
- Part VI Enterprise cloud computing
- References
- Index
Summary
The MapReduce programming model was developed at Google in the process of implementing large-scale search and text processing tasks on massive collections of web data stored using BigTable and the GFS distributed file system. The MapReduce programming model is designed for processing and generating large volumes of data via massively parallel computations utilizing tens of thousands of processors at a time. The underlying infrastructure to support this model needs to assume that processors and networks will fail, even during a particular computation, and build in support for handling such failures while ensuring progress of the computations being performed. Hadoop is an open source implementation of the MapReduce model developed at Yahoo, and presumably also used internally. Hadoop is also available on pre-packaged AMIs in the Amazon EC2 cloud platform, which has sparked interest in applying the MapReduce model for large-scale, fault-tolerant computations in other domains, including such applications in the enterprise context.
PARALLEL COMPUTING
Parallel computing has a long history with its origins in scientific computing in the late 60s and early 70s. Different models of parallel computing have been used based on the nature and evolution of multiprocessor computer architectures. The shared-memory model assumes that any processor can access any memory location, albeit not equally fast. In the distributed-memory model each processor can address only its own memory and communicates with other processors using message passing over the network.
- Type
- Chapter
- Information
- Enterprise Cloud ComputingTechnology, Architecture, Applications, pp. 131 - 143Publisher: Cambridge University PressPrint publication year: 2010