We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This pearl, and the one following, is all about arithmetic coding, a way of doing data compression. Unlike other methods, arithmetic coding does not represent each individual symbol of the text as an integral number of bits; instead, the text as a whole is encoded as a binary fraction in the unit interval. Although the idea can be traced back much earlier, it was not until the publication of an “accessible implementation” by Witten, Neal and Cleary in 1987 that arithmetic coding became a serious competitor in the world of data compression. Over the past two decades the method has been refined and its advantages and disadvantages over rival schemes have been elucidated. Arithmetic coding can be more effective at compression than rivals such as Huffman coding, or Shannon–Fano coding, and is well suited to take account of the statistical properties of the symbols in a text. On the other hand, coding and decoding times are longer than with other methods.
Arithmetic coding has a well-deserved reputation for being tricky to implement; nevertheless, our aim in these two pearls is to give a formal development of the basic algorithms. In the present pearl, coding and decoding are implemented in terms of arbitrary-precision rational arithmetic. This implementation is simple and elegant, though expensive in time and space. In the following pearl, coding and decoding are reimplemented in terms of finite-precision integers. This is where most of the subtleties of the problem reside.
Oh what a tangled web we weave when first we practise to derive.
(With apologies to Sir Walter Scott)
Introduction
Consider the problem of generating all bit strings a1a2 … an of length n satisfying given constraints of the form ai ≤ aj for various i and j. The generation is to be in Gray path order, meaning that exactly one bit changes from one bit string to the next. The transition code is a list of integers naming the bit that is to be changed at each step. For example, with n = 3, consider the constraints a1 ≤ a2 and a3 ≤ a2. One possible Gray path is 000, 010, 011, 111, 110 with transition code [2, 3, 1, 3] and starting string 000.
The snag is that the problem does not always have a solution. For example, with n = 4 and the constraints a1 ≤ a2 ≤ a4 and a1 ≤ a3 ≤ a4, the six possible bit strings, namely 0000, 0001, 0011, 0101, 0111 and 1111, cannot be permuted into a Gray path. There are four strings of even weight (the numbers of 1s) and two of odd weight, and in any Gray path the parity of the weights has to alternate.
Constraints of the form ai ≤ aj on bit strings of length n can be represented by a digraph with n nodes in which a directed edge i ← j is associated with a constraint ai ≤ aj.
Imagine a program for generating combinatorial patterns of some kind, patterns such as the subsequences or permutations of a list. Suppose that each pattern is obtained from its predecessor by a single transition. For subsequences a transition i could mean “insert or delete the element at position i”. For permutations a transition i could mean “swap the item in position i with the one in position i − 1”. An algorithm for generating all patterns is called loopless if the first transition is produced in linear time and each subsequent transition in constant time. Note that it is the transitions that are produced in constant time, not the patterns; writing out a pattern is not usually possible in constant time.
Loopless algorithms were formulated in a procedural setting, and many clever tricks, such as the use of focus pointers, doubly linked lists and coroutines, have been used to construct them. This pearl and the following two explore what a purely functional approach can bring to the subject. We will calculate loopless functional versions of the Johnson–Trotter algorithm for producing permutations, the Koda–Ruskey algorithm for generating all prefixes of a forest and its generalisation to Knuth's spider spinning algorithm for generating all bit strings satisfying given inequality constraints. These novel functional algorithms rely on nothing more fancy than lists, trees and queues. The present pearl is mostly devoted to exploring the topic and giving some warm-up exercises.
The setting is a tutorial on functional algorithm design. There are four students: Anne, Jack, Mary and Theo.
Teacher: Good morning class. Today I would like you to solve the following problem. Imagine a set P of people at a party. Say a subset C of P forms a celebrity clique if C is nonempty, everybody at the party knows every member of C, but members of C know only each other. Assuming there is such a clique at the party, your problem is to write a functional program for finding it. As data for the problem you are given a binary predicate knows and the set P as a list ps not containing duplicates.
Jack: Just to be clear, does every member of a celebrity clique actually know everyone else in the clique? And does everyone know themselves?
Teacher: As to the first question, yes, it follows from the definition: everyone in the clique is known by everyone at the party. As to the second question, the answer is not really relevant to the problem, so ask a philosopher. If it simplifies things to assume that x knows x for all x, then go ahead and do so.
Theo: This is going to be a hard problem, isn't it? I mean, the problem of determining whether there is a clique of size k in a party of n people will take Ω(nk) steps, so we are looking at an exponential time algorithm.
The Burrows–Wheeler transform (BWT) is a method for permuting a list with the aim of bringing repeated elements together. Its main use is as a preprocessing step in data compression. Lists with many repeated adjacent elements can be encoded compactly using simple schemes such as run length or move-to-front encoding. The result can then be fed into more advanced compressors, such as Huffman or arithmetic coding, to compress the input even more.
Clearly, the best way of bringing repeated elements together is just to sort the list. But the idea has a major flaw as a preliminary to compression: there is no way to recover the original list unless the complete sorting permutation is also produced as part of the output. Without the ability to recover the original input, data compression is pointless; and if a permutation has to be produced as well, then compression is ineffective. Instead, the BWT achieves a more modest permutation, one that brings some but not all repeated elements into adjacent positions. The main advantage of the BWT is that the transform can be inverted using a single additional piece of information, namely an integer k in the range 0 ≤ k < n, where n is the length of the (nonempty) input list. In this pearl we describe the BWT, identify the fundamental reason why inversion is possible, and use it to derive the inverse transform from its specification.
The Johnson–Trotter algorithm is a method for producing all the permutations of a given list in such a way that the transition from one permutation to the next is accomplished by a single transposition of adjacent elements. In this pearl we calculate a loopless version of the algorithm. The main idea is to make use of one of the loopless programs for the generalised boustrophedon product boxall developed in the previous pearl.
A recursive formulation
In the Johnson–Trotter permutation algorithm the transitions for a list of length n of length greater than one are defined recursively in terms of the transitions for a list of length n−1. Label the elements of the list with positions 0 through n−1 and let the list itself be denoted by xs [x]. Begin with a downward run [n−1, n−2,…, 1], where transition i means “interchange the element at position i with the element at position i−1”. The effect is to move x from the last position to the first, resulting in the final permutation [x] xs. For example, the transitions [3, 2, 1] applied to the string “abcd” result in the three permutations “abdc”, “adbc” and “dabc”. Next, suppose the transitions generating the permutations of xs are [j1, j2,…]. Apply the transition j1+1 to the current permutation [x] xs. We have to increase j1 by one because xs is now one position to the right of the “runner” x.