Applied Linear Algebra¶
Principle Component Analysis¶
Note
We have a data matrix \(\mathbf{X}\in\mathbb{R}^{n\times m}\) which represents a sample of size \(n\).
Each row represents an observation \((\mathbf{x}^*_k)^\top\in\mathbb{R}^m\).
We want to reduce the dimensionality of the data to \(k\ll m\), keeping as much information about the data as we could.
Note
Let \(\bar{\mathbf{X}}\) be the centred version of \(\mathbf{X}\) after removing sample mean.
Then \(\mathbf{C}=\bar{\mathbf{X}}^\top\bar{\mathbf{X}}\) is the variance-covariance matrix.
Total variance
\[\mathbb{V}(\bar{\mathbf{X}})=\frac{1}{n}\sum_{i=1}^n\mathbb{V}(\bar{\mathbf{x}}^*_i)=\frac{1}{n}\text{trace}(\bar{\mathbf{X}})=\frac{1}{n}(\sigma_1^2+\cdots+\sigma_n^2)\]Therefore, \(\mathbf{v}_1\) provides the direction in which the variance of the data is maximised.
We can create best \(k\) rank approximation by \(\mathbf{X}_k=\mathbf{U}_k\boldsymbol{\Sigma}_k\mathbf{V}_k\) which achieves the goal.
Attention
This approach minimises the perpendicular distance of each data point to the singular vector directions.
[SO] What exactly should be called “projection matrix” in the context of PCA?
Matrix Identities¶
Note
Matrix Inversion Lemma (Woodburry Identity):
\[(\mathbf{Z}+\mathbf{U}\mathbf{W}\mathbf{V}^T)^{-1}=\mathbf{Z}^{-1}-\mathbf{Z}^{-1}\mathbf{U}\mathbf{X}\mathbf{V}^T\mathbf{Z}^{-1}\]Where \(\mathbf{X}=\mathbf{W}^{-1}+\mathbf{V}^T\mathbf{Z}^{-1}\mathbf{U}\)
If \(\mathbf{A}\) and \(\mathbf{B}\) are symmetric
\[(\mathbf{A}^{-1}+\mathbf{B}^{-1})^{-1}=\mathbf{A}-\mathbf{A}(\mathbf{A}+\mathbf{B})^{-1}\mathbf{A}\]1d verification: \(\frac{1}{1/a+1/b}=\frac{ab}{a+b}=\frac{a^2-a^2+ab}{a+b}=\frac{a(a+b)-a^2}{a+b}=a-a(a+b)^{-1}a\)
For determinants:
\[|\mathbf{Z}+\mathbf{U}\mathbf{W}\mathbf{V}^T|=|\mathbf{Z}||\mathbf{W}||\mathbf{X}|\]
Resources¶
Important
[Georgia Tech] Interactive Linear Algebra