Information Theory¶
Shanon Entropy
KL Divergence
Cross Entropy
Entropy and Mutual Information¶
Note
For a rv with PMF \(X\sim p_X\), the term \(H(X)=-\sum_x p_X(x)\lg(p_X(x))\) defines entropy which is a measure of uncertainty.
For 2 rvs with a joint distribution \(p_{X,Y}(x,y)\), the term \(I(X,Y)=\sum_x\sum_y p_{X,Y}(x,y)\lg\left(\frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)}\right)\) defines mutual information.
[Prove] Let \(H(X,Y)=-\sum_x\sum_y p_{X,Y}(x,y)\lg(p_{X,Y}(x,y))\). Then
\[I(X,Y)=H(X)+H(Y)-H(X,Y)\][Prove] Let
\[ \begin{align}\begin{aligned}H(X|Y)=-\sum_y p_Y(y)\sum_x p_{X|Y}(x|y)\lg(p_{X|Y}(x|y))=\mathbb{E}_Y\left[\sum_x p_{X|Y}(x|y)\lg(p_{X|Y}(x|y))\right]\\This can be thought of as the expected conditional entropy. Then\end{aligned}\end{align} \]\[I(X,Y)=H(X)-H(X|Y)\]
Tip
The term \(I(X,Y)\) can be thought of as the reduction in entropy (from \(H(X)\)) once we observe \(Y\). It is therefore the information about \(X\) conveyed by \(Y\).
[Prove] If \(X\perp\!\!\!\perp Y\), what is the mutual information?
Attention
[Prove] Let the PMF of \(X=\{x_1,\cdots,x_n\}\) is defined by the masses \(p_1,\cdots,p_n\) such that \(\sum_{i=1}^n p_i=1\). Let us define another PMF \(q_1,\cdots,q_n\) such that \(\sum_{i=1}^n q_i=1\). Then \(H(X)\leq-\sum_{i=1}^n p_i\lg(q_i)\) and the equality holds only when \(p_i=q_i\) for all \(i\).
[Hint] Use the inequality \(\ln(\alpha)=\alpha-1\) for \(\alpha>0\).
As a special case of the above, \(H(X)\leq\lg(n)\) and the equality holds when \(p_i=\frac{1}{n}\) for all \(i\).
Important
Information Theory Notes from Introduction To Renormalization
[SO] Estimate the Kullback–Leibler (KL) divergence with Monte Carlo