Existence of Code

A source code $C$ for a random variable $X$ is a mapping from $\mathcal{X}$ to $D^*$

  • $X$: the range of $X$
  • $D$-ary alphabet is $\mathcal{D}={0,1,\cdots,D-1}$
  • $C(x)$: codeword for $x$
  • $l(x)$: length of $C(x)$
  • $L(C)=\sum_{x\in\mathcal{X}}p(x)l(x)$: the expected length of a source code $C(x)$ for a random variable
  • rate $R$: average length

Nonsigular code

  • every element of the range $X$ maps into a different string in $D^*$
  • $C^*$: extension of a code $C$ is mapping from finite length strings of $X$ to finite-length strings of $D$, $C(x_1x_2\cdots x_n)=C(x_1)C(x_2)\cdots C(x_n)$
  • Uniquely decodable: extension is nonsigular

Prefix Code (instantaneous code)

  • no codeword is a prefix of any other codeword
  • Tree representation
    • Kraft Inequality (1949): For any instantaneous code over an alphabet of size $D$, the codeword lengths $l_1,l_2,\cdots,l_m$ must satisfy the inequality $\sum_{i=1}^mD^{-l_i}\leq 1$
    • Conversly, given codeword satisfy above inequality, the prefix code exists.
  • Interval representation
    • Extended Kraft Inequality: For any countably infinite set of codewords

Optimal Codes

  • optimal: $\sum_{p_il_i}$ is minimal
  • optimization problem: Lagrange $$\min_{l_i} L=\sum p_il_i\\sum D^{-l_i}\leq 1$$
  • $p_i=D^{-l_i},l_i=-\log p_i,L^*\geq H_D(X)$
  • Bounds: $H_D(X)\leq L^*<H_D(X)+1$
  • Shannon codes: $l_i=\lceil\log_D\frac{1}{p_i}\rceil$

Encode $n$ symbols together (block encode to remove 1 in $H_D(X)+1$)

  • $\frac{H(X_1,X_2,\cdots,X_n)}{n}\leq L^<\frac{H(X_1,X_2,\cdots,X_n)}{n}+\frac{1}{n}$, if stationary stochastic process, $L^\rightarrow H(\mathcal{X})$
  • shortcome: alphabet has $|X^n|$

Wrong code

  • The expected length under $p(x)$ of the code assignment $l(x)=\log\frac{1}{q(x)}$: $H(p)+D(p|q)\leq E_pl(x)<H(p)+D(p|q)+1$

Kraft Inequality for Uniquely Decodable Codes (McMillan): The codeword length of any uniquely decodable $D$-ary code must satisfy the Kraft inequality $\sum_{D}^{-l_i}\leq 1$. Conversely, given a set of codeword satisfy this inequality, it is possible to construct a uniquely decodable with these codeword lengths. (prefix code is ’no less than’ any other code)

Codes

Huffman Codes

  • $D$-ary Huffman codes (prefix code) for a given distribution: Each time combine $D$ symbols with the lowest probabilities into a single source symbol, until there is only one symbol
  • add dummy symbols to the end of the set of symbols: $1+k(D-1)$
  • optimal: $C’$ is any other uniquely decodable code, $L(C)\leq L(C’)$

Shannon Codes

  • $D$-adic distribution: $P(X=x_i)=D^{-n}$, $l_i=\log\frac{1}{p_i}=n_i$
  • Shannon codes
    • attain optimality within 1
    • $D$-adic: Shannon codes are optimal
    • $p_i\rightarrow0$: Shannon codes may be worse

For any optimal coding scheme

  • lengths are ordered inversely with probability ($p_j>p_k,l_j\leq l_k$)
  • the two longest codewords have the same length
  • two of the longest codewords differ only in the last bit

Shannon-Fano-Elias Coding

  • $F(x)=\sum_{a\leq x}p(a)$
  • modified cumulative distribution function: $\overline F(x)=\sum_{a<x}p(a)+\frac{1}{2}p(x)=F(x)+\frac{1}{2}p(x)$
  • Truncate $\overline{F}(x)$ to $l(x)$ bit, $\overline F(x)-|\overline{F}(x)|_{l(x)}\leq\frac{1}{2^{l(x)}}$
  • $l(x)=\lceil\log\frac{1}{p(x)}\rceil+1$, $\frac{1}{2^{l(x)}}\leq\frac{p(x)}{2}=\overline{F}(x)-\overline{F}(x-1)$
  • $L=\sum p(x)l(x)<H(X)+2$

Random Variable Generation

  • fair coin toss: $Z_1,\cdots, Z_n$
  • expected number of fair bits $E(T)$
  • generate variable: leaves are given symbols of $X$
  • $E(T)\geq H(X)$
    • dyadic distribution: $E(T)=H(X)$
    • otherwise: use binary expansions for the probabilities, $H(X)\leq ET<H(X)+2$

Universal Source Coding

  • minimax redundancy: $R^*=\min_q\max_{p_0}R=\min_q\max_{p_0}D(p_\theta|q)$
  • Arithmetic Coding
  • LZ77: slidinig windows
  • LZ78: Trie