Preliminary

  • A mathematical theory of communicationo - Shannon 1948
  • Convexity: $f(\sum_ip_ix_i)\leq \sum_ip_if(x_i)$
    • $f(E_p x_i)\leq E_pf(x_i)$
    • convex: $f’’(x)\geq 0$
  • $f(x)=-x\log x$ is concave

Entropy

Entropy: $H(X)=-\sum_{x\in\mathcal{X}}p(x)\log p(x)=E_{p}\log\frac{1}{p(X)}$

  • $0\log 0\rightarrow 0$
  • $0\leq H(X)\leq \log|\mathcal{X}|$
  • uniform $X$: $H(X)=\log|\mathcal{X}|$
  • $H_b(X)=\log_baH_a(X)$

Joint Entropy: $H(X,Y)=-E\log p(X,Y)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(x,y)$

  • $H(X,X) = H(X)$
  • $H(X,Y) = H(Y,X)$

Conditional Entropy: $H(Y|X)=\sum_{x\in\mathcal{X}}p(x)H(Y|X=x)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(y|x)=-E\log p(y|x)$

  • $H(Y|X)\leq H(X)$
  • remaining uncertainty when $X$ is known
  • $H(X|Y)\neq H(Y|X)$
  • $H(X|Y)+H(Y)=H(Y|X)+H(X)=H(X,Y)$
  • Chain rule: $H(X_1,X_2,\cdots,X_n)=H(X_1)+H(X_2|X_1)+\cdots+H(X_n|X_{n-1},\cdots,X_1)$
  • Zero conditional entropy: if $H(Y|X)=0,Y=f(X)$

Relative Entropy (Kullback-Leibler distance): $D(p|q)=\sum_{x\in X}p(x)\log\frac{p(x)}{q(x)}=E_p\log\frac{p(X)}{q(X)}=E_p(-\log q(x))-H(p)$

  • $D(p|q)\geq 0$ equality iff $p(x)=q(x)$ (prove by concavity)
  • variational distance $V(p,q)=\sum_{x\in\mathcal{X}}|p(x)-q(x)|$
  • Pinsker’s inequlity: $D(p|q)\geq\frac{1}{2\ln 2}V^2(p,q)$
  • $D(p|u)=\log|\mathcal{X}|-H(X)$

Mutual Information: $I(X;Y)=\sum_x\sum_yp(x,y)\log\frac{p(x,y)}{p(x)p(y)}=D(p(x,y)|p(x)p(y))=E_{p(x,y)}\log\frac{p(X,Y)}{p(X)p(Y)}$

  • Independent $I(X;Y)=0$
  • $I(X;X)=H(X)$
  • $I(X;Y)=H(X)-H(X|Y)=H(X)+H(Y)-H(X,Y)$

Conditional Mutual Information: $I(X;Y|Z)=H(X|Z)-H(X|Y,Z)=E_{p(x,y,z)}\log\frac{p(X,Y|Z)}{p(X|Z)p(Y|Z)}$

  • Chain rule: $I(X_1,X_2,\cdots,X_n;Y)=\sum_{i=1}^nI(X_i;Y|X_{i-1},\cdots,X_1)$

Conditional Relative Entropy: $D(p(y|x)|q(y|x))=E_{p(x,y)}\log \frac{q(Y|X)}{q(Y|X)}$

Inequality

  • $H(X_1,X_2,\cdots,X_n)\leq\sum_{i=1}^nH(X_i)$ equality iff $X_i$ are independent
  • $I(X;Y|Z)$ and $I(X;Y)$
    • $X\rightarrow Y\rightarrow Z:I(X;Z|Y)=0$
    • $Z=X+Y\bmod 2:I(X;Y|Z)>I(X;Y)$
  • Data Processing Inequality: $X\rightarrow Y\rightarrow Z,I(X;Y)\geq I(X;Z)$
    • $X\rightarrow Y\rightarrow Z,I(X;Y)=I(X;Z)+I(X;Y|Z)$
  • $I(X;Y;Z)=I(X;Y)-I(X;Y|Z)$
  • Perfect Secrecy: $H(X)\leq H(Z)$ 信息长度小于秘钥长度
  • Fano’s Inequality: relationship between $P_e$ and $H(X|Y)$
    • $X\rightarrow Y\rightarrow \hat X,P_e=P(\hat X\neq X)$
    • $H(P_e)+P_e\log|\mathcal{X}|\geq H(X|\hat X)\geq H(X|Y)$
    • weaken: $P_e\geq\frac{H(X|Y)-1}{\log |\mathcal{X}|}$
  • Log sum inequality: for nonnegative numbers, $\sum_{i=1}^na_i\log\frac{a_i}{b_i}\geq(sum_{i=1}^na_i)\log\frac{\sum_{i=1}^na_i}{\sum_{i=1}^nb_i}$, equal iff $\frac{a_i}{b_i}=c$
  • $H(p)$ is concave of $p$
  • $D(p|q)$ is convex in pair $(p,q)$