Applied Linear Algebra

Principle Component Analysis

Note

  • We have a data matrix \(\mathbf{X}\in\mathbb{R}^{n\times m}\) which represents a sample of size \(n\).

  • Each row represents an observation \((\mathbf{x}^*_k)^\top\in\mathbb{R}^m\).

  • We want to reduce the dimensionality of the data to \(k\ll m\), keeping as much information about the data as we could.

Note

  • Let \(\bar{\mathbf{X}}\) be the centred version of \(\mathbf{X}\) after removing sample mean.

  • Then \(\mathbf{C}=\bar{\mathbf{X}}^\top\bar{\mathbf{X}}\) is the variance-covariance matrix.

  • Total variance

    \[\mathbb{V}(\bar{\mathbf{X}})=\frac{1}{n}\sum_{i=1}^n\mathbb{V}(\bar{\mathbf{x}}^*_i)=\frac{1}{n}\text{trace}(\bar{\mathbf{X}})=\frac{1}{n}(\sigma_1^2+\cdots+\sigma_n^2)\]
  • Therefore, \(\mathbf{v}_1\) provides the direction in which the variance of the data is maximised.

  • We can create best \(k\) rank approximation by \(\mathbf{X}_k=\mathbf{U}_k\boldsymbol{\Sigma}_k\mathbf{V}_k\) which achieves the goal.

Attention

Matrix Identities

Note

  • Matrix Inversion Lemma (Woodburry Identity):

    \[(\mathbf{Z}+\mathbf{U}\mathbf{W}\mathbf{V}^T)^{-1}=\mathbf{Z}^{-1}-\mathbf{Z}^{-1}\mathbf{U}\mathbf{X}\mathbf{V}^T\mathbf{Z}^{-1}\]
    • Where \(\mathbf{X}=\mathbf{W}^{-1}+\mathbf{V}^T\mathbf{Z}^{-1}\mathbf{U}\)

  • If \(\mathbf{A}\) and \(\mathbf{B}\) are symmetric

    \[(\mathbf{A}^{-1}+\mathbf{B}^{-1})^{-1}=\mathbf{A}-\mathbf{A}(\mathbf{A}+\mathbf{B})^{-1}\mathbf{A}\]
    • 1d verification: \(\frac{1}{1/a+1/b}=\frac{ab}{a+b}=\frac{a^2-a^2+ab}{a+b}=\frac{a(a+b)-a^2}{a+b}=a-a(a+b)^{-1}a\)

  • For determinants:

    \[|\mathbf{Z}+\mathbf{U}\mathbf{W}\mathbf{V}^T|=|\mathbf{Z}||\mathbf{W}||\mathbf{X}|\]