Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- 2 A Warm-up
- 3 Random Sampling
- 4 List Ranking
- 5 Sorting Atomic Items
- 6 Set Intersection
- 7 Sorting Strings
- 8 The Dictionary Problem
- 9 Searching Strings by Prefix
- 10 Searching Strings by Substring
- 11 Integer Coding
- 12 Statistical Coding
- 13 Dictionary-Based Compressors
- 14 Block-Sorting Compression
- 15 Compressed Data Structures
- 16 Conclusion
- Index
14 - Block-Sorting Compression
Published online by Cambridge University Press: 08 June 2023
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- 2 A Warm-up
- 3 Random Sampling
- 4 List Ranking
- 5 Sorting Atomic Items
- 6 Set Intersection
- 7 Sorting Strings
- 8 The Dictionary Problem
- 9 Searching Strings by Prefix
- 10 Searching Strings by Substring
- 11 Integer Coding
- 12 Statistical Coding
- 13 Dictionary-Based Compressors
- 14 Block-Sorting Compression
- 15 Compressed Data Structures
- 16 Conclusion
- Index
Summary
This chapter describes a data compression technique devised by Mike Burrows and David Wheeler in 1994 at DEC Systems Research Center. This technique is known as the Burrows–Wheeler Transform (or BWT) and offers a revolutionary alternative to dictionary-based and statistical compressors. It is the algorithmic core of a new class of data compressors (such as bzip2), as well as of new powerful compressed indexes (such as the FM-index). The chapter describes the algorithmic details of the BWT and of two other simple compressors, Move-to-Front and Run-Length Encoding, whose combination constitutes the design core of bzip-based compressors. This description is accompanied by the theoretical analysis of the impact of BWT on data compression, in terms of the k-th order empirical entropy of the input data, and by a sketch of the main algorithmic issues that underlie the design of the first provably compressed suffix array to date, namely the FM-index. Given the technicalities involved in the description of the BWT and the FM-index, this chapter offers several running examples and illustrative figures which should ease their understanding.
- Type
- Chapter
- Information
- Pearls of Algorithm Engineering , pp. 252 - 273Publisher: Cambridge University PressPrint publication year: 2023