Preliminary
- A mathematical theory of communicationo - Shannon 1948
- Convexity: $f(\sum_ip_ix_i)\leq \sum_ip_if(x_i)$
- $f(E_p x_i)\leq E_pf(x_i)$
- convex: $f’’(x)\geq 0$
- $f(x)=-x\log x$ is concave
Entropy
Entropy: $H(X)=-\sum_{x\in\mathcal{X}}p(x)\log p(x)=E_{p}\log\frac{1}{p(X)}$
- $0\log 0\rightarrow 0$
- $0\leq H(X)\leq \log|\mathcal{X}|$
- uniform $X$: $H(X)=\log|\mathcal{X}|$
- $H_b(X)=\log_baH_a(X)$
Joint Entropy: $H(X,Y)=-E\log p(X,Y)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(x,y)$
- $H(X,X) = H(X)$
- $H(X,Y) = H(Y,X)$
Conditional Entropy: $H(Y|X)=\sum_{x\in\mathcal{X}}p(x)H(Y|X=x)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(y|x)=-E\log p(y|x)$
- $H(Y|X)\leq H(X)$
- remaining uncertainty when $X$ is known
- $H(X|Y)\neq H(Y|X)$
- $H(X|Y)+H(Y)=H(Y|X)+H(X)=H(X,Y)$
- Chain rule: $H(X_1,X_2,\cdots,X_n)=H(X_1)+H(X_2|X_1)+\cdots+H(X_n|X_{n-1},\cdots,X_1)$
- Zero conditional entropy: if $H(Y|X)=0,Y=f(X)$
Relative Entropy (Kullback-Leibler distance): $D(p|q)=\sum_{x\in X}p(x)\log\frac{p(x)}{q(x)}=E_p\log\frac{p(X)}{q(X)}=E_p(-\log q(x))-H(p)$
- $D(p|q)\geq 0$ equality iff $p(x)=q(x)$ (prove by concavity)
- variational distance $V(p,q)=\sum_{x\in\mathcal{X}}|p(x)-q(x)|$
- Pinsker’s inequlity: $D(p|q)\geq\frac{1}{2\ln 2}V^2(p,q)$
- $D(p|u)=\log|\mathcal{X}|-H(X)$
Mutual Information: $I(X;Y)=\sum_x\sum_yp(x,y)\log\frac{p(x,y)}{p(x)p(y)}=D(p(x,y)|p(x)p(y))=E_{p(x,y)}\log\frac{p(X,Y)}{p(X)p(Y)}$
- Independent $I(X;Y)=0$
- $I(X;X)=H(X)$
- $I(X;Y)=H(X)-H(X|Y)=H(X)+H(Y)-H(X,Y)$
Conditional Mutual Information: $I(X;Y|Z)=H(X|Z)-H(X|Y,Z)=E_{p(x,y,z)}\log\frac{p(X,Y|Z)}{p(X|Z)p(Y|Z)}$
- Chain rule: $I(X_1,X_2,\cdots,X_n;Y)=\sum_{i=1}^nI(X_i;Y|X_{i-1},\cdots,X_1)$
Conditional Relative Entropy: $D(p(y|x)|q(y|x))=E_{p(x,y)}\log \frac{q(Y|X)}{q(Y|X)}$
Inequality
- $H(X_1,X_2,\cdots,X_n)\leq\sum_{i=1}^nH(X_i)$ equality iff $X_i$ are independent
- $I(X;Y|Z)$ and $I(X;Y)$
- $X\rightarrow Y\rightarrow Z:I(X;Z|Y)=0$
- $Z=X+Y\bmod 2:I(X;Y|Z)>I(X;Y)$
- Data Processing Inequality: $X\rightarrow Y\rightarrow Z,I(X;Y)\geq I(X;Z)$
- $X\rightarrow Y\rightarrow Z,I(X;Y)=I(X;Z)+I(X;Y|Z)$
- $I(X;Y;Z)=I(X;Y)-I(X;Y|Z)$
- Perfect Secrecy: $H(X)\leq H(Z)$ 信息长度小于秘钥长度
- Fano’s Inequality: relationship between $P_e$ and $H(X|Y)$
- $X\rightarrow Y\rightarrow \hat X,P_e=P(\hat X\neq X)$
- $H(P_e)+P_e\log|\mathcal{X}|\geq H(X|\hat X)\geq H(X|Y)$
- weaken: $P_e\geq\frac{H(X|Y)-1}{\log |\mathcal{X}|}$
- Log sum inequality: for nonnegative numbers, $\sum_{i=1}^na_i\log\frac{a_i}{b_i}\geq(sum_{i=1}^na_i)\log\frac{\sum_{i=1}^na_i}{\sum_{i=1}^nb_i}$, equal iff $\frac{a_i}{b_i}=c$
- $H(p)$ is concave of $p$
- $D(p|q)$ is convex in pair $(p,q)$