Introduction
Consider a discrete memoryless source with source statistics p = (p0, p1, …, pr−1), that is, a sequence U1, U2, … of independent, identically distributed random variables with common distribution P{U = i} = pi, i = 0, 1, …, r − 1. According to the results of Chapter 3, it is in principle possible to represent this source faithfully using an average of bits per source symbol. In this chapter we study an attractive constructive procedure for doing this, called variable-length source coding.
To get the general idea (and the reader is warned that the following example is somewhat meretricious), consider the particular source p = (½, ¼ ⅛ ⅛), whose entropy is H2(½, ¼ ⅛ ⅛) = 1.75 bits. The source alphabet is AU = {0, 1, 2, 3}; now let us encode the source sequence U1, U2, … according to Table 11.1. For example, the source sequence 03220100 … would be encoded into 011111011001000 …. Clearly the average number of bits required per source symbol using this code is ½ · 1 + ¼ · 2 + ⅛ · 3 + ⅛ · 3 = 1.75, the source entropy. This fact in itself is not surprising, since we could, for example, get the average down to only one bit per symbol by encoding everything into 0! The remarkable property enjoyed by the code of Table 11.1 is that it is uniquely decodable, that is, the source sequence can be reconstructed perfectly from the encoded stream.