Existence of Code
A source code $C$ for a random variable $X$ is a mapping from $\mathcal{X}$ to $D^*$
- $X$: the range of $X$
- $D$-ary alphabet is $\mathcal{D}={0,1,\cdots,D-1}$
- $C(x)$: codeword for $x$
- $l(x)$: length of $C(x)$
- $L(C)=\sum_{x\in\mathcal{X}}p(x)l(x)$: the expected length of a source code $C(x)$ for a random variable
- rate $R$: average length
Nonsigular code
- every element of the range $X$ maps into a different string in $D^*$
- $C^*$: extension of a code $C$ is mapping from finite length strings of $X$ to finite-length strings of $D$, $C(x_1x_2\cdots x_n)=C(x_1)C(x_2)\cdots C(x_n)$
- Uniquely decodable: extension is nonsigular
Prefix Code (instantaneous code)
- no codeword is a prefix of any other codeword
- Tree representation
- Kraft Inequality (1949): For any instantaneous code over an alphabet of size $D$, the codeword lengths $l_1,l_2,\cdots,l_m$ must satisfy the inequality $\sum_{i=1}^mD^{-l_i}\leq 1$
- Conversly, given codeword satisfy above inequality, the prefix code exists.
- Interval representation
- Extended Kraft Inequality: For any countably infinite set of codewords
Optimal Codes
- optimal: $\sum_{p_il_i}$ is minimal
- optimization problem: Lagrange $$\min_{l_i} L=\sum p_il_i\\sum D^{-l_i}\leq 1$$
- $p_i=D^{-l_i},l_i=-\log p_i,L^*\geq H_D(X)$
- Bounds: $H_D(X)\leq L^*<H_D(X)+1$
- Shannon codes: $l_i=\lceil\log_D\frac{1}{p_i}\rceil$
Encode $n$ symbols together (block encode to remove 1 in $H_D(X)+1$)
- $\frac{H(X_1,X_2,\cdots,X_n)}{n}\leq L^<\frac{H(X_1,X_2,\cdots,X_n)}{n}+\frac{1}{n}$, if stationary stochastic process, $L^\rightarrow H(\mathcal{X})$
- shortcome: alphabet has $|X^n|$
Wrong code
- The expected length under $p(x)$ of the code assignment $l(x)=\log\frac{1}{q(x)}$: $H(p)+D(p|q)\leq E_pl(x)<H(p)+D(p|q)+1$
Kraft Inequality for Uniquely Decodable Codes (McMillan): The codeword length of any uniquely decodable $D$-ary code must satisfy the Kraft inequality $\sum_{D}^{-l_i}\leq 1$. Conversely, given a set of codeword satisfy this inequality, it is possible to construct a uniquely decodable with these codeword lengths. (prefix code is ’no less than’ any other code)
Codes
Huffman Codes
- $D$-ary Huffman codes (prefix code) for a given distribution: Each time combine $D$ symbols with the lowest probabilities into a single source symbol, until there is only one symbol
- add dummy symbols to the end of the set of symbols: $1+k(D-1)$
- optimal: $C’$ is any other uniquely decodable code, $L(C)\leq L(C’)$
Shannon Codes
- $D$-adic distribution: $P(X=x_i)=D^{-n}$, $l_i=\log\frac{1}{p_i}=n_i$
- Shannon codes
- attain optimality within 1
- $D$-adic: Shannon codes are optimal
- $p_i\rightarrow0$: Shannon codes may be worse
For any optimal coding scheme
- lengths are ordered inversely with probability ($p_j>p_k,l_j\leq l_k$)
- the two longest codewords have the same length
- two of the longest codewords differ only in the last bit
Shannon-Fano-Elias Coding
- $F(x)=\sum_{a\leq x}p(a)$
- modified cumulative distribution function: $\overline F(x)=\sum_{a<x}p(a)+\frac{1}{2}p(x)=F(x)+\frac{1}{2}p(x)$
- Truncate $\overline{F}(x)$ to $l(x)$ bit, $\overline F(x)-|\overline{F}(x)|_{l(x)}\leq\frac{1}{2^{l(x)}}$
- $l(x)=\lceil\log\frac{1}{p(x)}\rceil+1$, $\frac{1}{2^{l(x)}}\leq\frac{p(x)}{2}=\overline{F}(x)-\overline{F}(x-1)$
- $L=\sum p(x)l(x)<H(X)+2$
Random Variable Generation
- fair coin toss: $Z_1,\cdots, Z_n$
- expected number of fair bits $E(T)$
- generate variable: leaves are given symbols of $X$
- $E(T)\geq H(X)$
- dyadic distribution: $E(T)=H(X)$
- otherwise: use binary expansions for the probabilities, $H(X)\leq ET<H(X)+2$
Universal Source Coding
- minimax redundancy: $R^*=\min_q\max_{p_0}R=\min_q\max_{p_0}D(p_\theta|q)$
- Arithmetic Coding
- LZ77: slidinig windows
- LZ78: Trie