Entropy
- (continuous) $X$ with cumulative distribution function $F(x)=Pr(X\leq x)$
- support set of $X$: $f(x)>0$
- differential entropy $h(x)$: $h(X)=-\int_Sf(x)\log f(x)dx$
- $h(X+c) = h(X)$
- $h(aX)=h(X)+\log|a|$
- $h(AX)=h(X)+\log|\det A|$
- $h(X)$ may be negative ($f(x)$ may $>1$)
- uniform: $h(X)=\log a$
- Gaussian: $h(X)=\frac{1}{2}\log 2\pi e\sigma^2$
- $h(X)$: Infinite Information
- does not serve as a measure of the average amount of information
- $h(X_1,X_2,\cdots,X_n)=-\int f(x^n)\log f(x^n)dx^n$
- $h(X|Y)=-\int f(x,y)\log f(x|y)dxdy)$
- Relative Entropy: $D(f|g)=\int f\log\frac{f}{g}\geq 0$
- mutual information: $I(X;Y)=\int f(x,y)\log\frac{f(x,y)}{f(x)f(y)}dxdy\geq 0$
Relation to discrete
- $X^\Delta=x_i$ if $i\Delta\leq x<(i+1)\Delta$
- $p_i=Pr(X^{Delta}=x_i)=f(x_i)\Delta$
- $H(X^{\Delta})=-\sum\Delta f(x_i)\log f(x_i)-\log \Delta$
- as $\Delta\rightarrow 0,H(X^\Delta)+\log \Delta\rightarrow h(f)=h(X)$
AEP
- $-\frac{1}{n}\log f(X_1,X_2,\cdots,X_n)\rightarrow E(-\log f(X))=h(f)$
- $A_\epsilon^{(n)}={(x_1,x_2,\cdots,x_n)\in S^n:|-\frac{1}{n}\log f(x_1,\cdots, x_n)-h(X)|\leq\epsilon}$
- $\text{Vol}(A)=\int_Adx_1dx_2\cdots dx_n$
- Properties
- $Pr(A_\epsilon^{(n)})>1-\epsilon$ for $n$ sufficiently large
- $\text{Vol}(A_\epsilon^{(n)})\leq 2^{n(h(X)+\epsilon)}$
- $\text{Vol}(A_\epsilon^{(n)})\geq (1-\epsilon)2^{n(h(X)-\epsilon)}$
Covariance Matrix
- cov($X$, $Y$)=$E(X-EX)(Y-EY)=E(XY)-(EX)(EY)$
- $\vec X$: $K_X=E(X-EX)(X-EX)^T=[\text{cov}(X_i;X_j)]$
- correlation matrix: $\widetilde K_X=EXX^T=[EX_iX_j]$
- symmetric and positive semidifinite
- $K_X=\widetilde K_X-(EX)(EX^T)$
- $Y=AX$
- $K_Y=AK_XA^T$
- $\widetilde K_Y=A\widetilde K_XA^T$
Multivariate Normal Distribution
$f(x)=\frac{1}{(2\pi)^{\frac{n}{2}}}\exp(-\frac{1}{2}(x-\mu)^TK^{-1}(x-\mu))$
- uncorrelated then independent
- $h(X_1,X_2,\cdots,X_n)=h(\mathcal{N}(\mu, K))=\frac{1}{2}\log(2\pi e)^n|K|$
- the mutual information between $X$ and $Y$ is $I(X;Y)=\sup_{P,Q}I( _P;[Y]_Q)$ over all finite partitions $P$ and $Q$
- Correlatetd Gaussian $(X,Y)\sim\mathcal{N}(0,K)$ $$K=\begin{bmatrix}\sigma^2 & \rho\sigma^2\\rho \sigma^2 & \sigma^2\end{bmatrix}$$
Maximum Entropy
- $X\in R$ have mean $\mu$ and variance $\sigma^2$, then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$ with equality iff $X\sim\mathcal{N}(\mu, \sigma^2)$
- $X\in R$ that $EX^2\leq \sigma^2$, then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$
- Problem: find density $f$ over $S$ meeting moment constraints $\alpha_1,\cdots,\alpha_m$
- $f(x)\geq 0$
- $\int_S f(x)dx=1$
- $\int_S f(x)r_i(x)dx=\alpha_i$
- Maximum entropy distribution: $f^*(x)=f_\lambda(x)=e^{\lambda_0+\sum_{i=1}^m\lambda_ir_i(x)}$
- $S=[a,b]$ with no other constraints: uniform distributioni over this range
- $S=[0,\infty), EX=\mu$, then $f(x)=\frac{1}{\mu}e^{-\frac{x}{\mu}}$
- $S=(-\infty, \infty), EX=\alpha_1,EX^2=\alpha_2$, then $\mathcal{N}(\alpha_1,\alpha_2-\alpha_1^2)$
Inequality
Hadamard’s Inequality
- $K$ is a nonnegative definite symmetric $n\times n$ matrix
- (Hadamard) $|K|\leq\prod K_{ii}$ with equality iff $K_{ij}=0,i\neq j$
Balanced Information Inequailty
- $h(X,Y)\leq h(X)+h(Y)$
- neither $h(X,Y)\geq h(X)$ nor $h(X,Y)\leq h(X)$
- $[n]={1,2,\cdots,n}$, for $\alpha\subset[n]$, $X_\alpha=(X_i:i\in\alpha)$
- linear continous inequality $\sum_\alpha w_\alpha h(X_\alpha)\geq 0$ is valid iff its corresponding discrete counterpart $\sum_\alpha w_\alpha H(X_\alpha)\geq 0$ is valid and balanced
Han’s Inequality
- $h_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))}{k}$
- $g_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))|X)(S^c)}{k}$
- Han’s Inequality: $h_1^{(n)}\geq h_2^{(n)}\geq\cdots\geq h_n^{(n)}=H(X_1,\cdots,X_n)/n=g_n^{(n)}\geq\cdots\geq g_2^{(n)}\geq g_1^{(n)}$
Information of Heat
- Heat equation (Fourier, 热传导方程): $x$ is position and $t$ is time, $\frac{\partial}{\partial t}f(x, t)=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x,t)$
- $Y_t=X+\sqrt{t}Z,Z\sim\mathcal{N}(0,1)$, then $f(y;t)=\int f(x)\frac{1}{\sqrt{2\pi t}}e^{-\frac{(y-x)^2}{2t}}dx$
- Gaussian channel – Heat Equaition
- Fisher Information: $I(X)=\int_{-\infty}^{+\infty}f(x)[\frac{\frac{\partial}{\partial x}f(x)}{f(x)}]^2dx$
- De Bruijn’s Identity: $\frac{\partial}{\partial t}h(Y_t)=\frac{1}{2}I(Y_t)$
Entropy power inequality
- EPI (Entropy power inequality) $e^{\frac{2}{n}h(X+Y)}\geq e^{\frac{2}{n}h(X)}+e^{\frac{2}{n}h(Y)}$ “最为强悍的工具”
- Uncertainty principle
- Young’s inequality
- Nash’s inequality
- Cramer-Rao bound: $V(\hat \theta)\geq\frac{1}{I(\theta)}$
- FII (Fisher information inequality) $\frac{1}{I(X+Y)}\geq\frac{1}{I(X)}+\frac{1}{I(Y)}$