Applied Linear Algebra¶

Principle Component Analysis¶

Note

We have a data matrix \(\mathbf{X}\in\mathbb{R}^{n\times m}\) which represents a sample of size \(n\).
Each row represents an observation \((\mathbf{x}^*_k)^\top\in\mathbb{R}^m\).
We want to reduce the dimensionality of the data to \(k\ll m\), keeping as much information about the data as we could.

Note

Let \(\bar{\mathbf{X}}\) be the centred version of \(\mathbf{X}\) after removing sample mean.
Then \(\mathbf{C}=\bar{\mathbf{X}}^\top\bar{\mathbf{X}}\) is the variance-covariance matrix.
Total variance

\[\mathbb{V}(\bar{\mathbf{X}})=\frac{1}{n}\sum_{i=1}^n\mathbb{V}(\bar{\mathbf{x}}^*_i)=\frac{1}{n}\text{trace}(\bar{\mathbf{X}})=\frac{1}{n}(\sigma_1^2+\cdots+\sigma_n^2)\]
Therefore, \(\mathbf{v}_1\) provides the direction in which the variance of the data is maximised.
We can create best \(k\) rank approximation by \(\mathbf{X}_k=\mathbf{U}_k\boldsymbol{\Sigma}_k\mathbf{V}_k\) which achieves the goal.

Attention

This approach minimises the perpendicular distance of each data point to the singular vector directions.
[SO] What exactly should be called “projection matrix” in the context of PCA?

Note

Matrix Inversion Lemma (Woodburry Identity):

\[(\mathbf{Z}+\mathbf{U}\mathbf{W}\mathbf{V}^T)^{-1}=\mathbf{Z}^{-1}-\mathbf{Z}^{-1}\mathbf{U}\mathbf{X}\mathbf{V}^T\mathbf{Z}^{-1}\]
- Where \(\mathbf{X}=\mathbf{W}^{-1}+\mathbf{V}^T\mathbf{Z}^{-1}\mathbf{U}\)
If \(\mathbf{A}\) and \(\mathbf{B}\) are symmetric

\[(\mathbf{A}^{-1}+\mathbf{B}^{-1})^{-1}=\mathbf{A}-\mathbf{A}(\mathbf{A}+\mathbf{B})^{-1}\mathbf{A}\]
- 1d verification: \(\frac{1}{1/a+1/b}=\frac{ab}{a+b}=\frac{a^2-a^2+ab}{a+b}=\frac{a(a+b)-a^2}{a+b}=a-a(a+b)^{-1}a\)
For determinants:

\[|\mathbf{Z}+\mathbf{U}\mathbf{W}\mathbf{V}^T|=|\mathbf{Z}||\mathbf{W}||\mathbf{X}|\]

Warning

Expand Code

Important