Matrix Factorisation

A=CR

This factorisation keeps the columns of the original matrix intact.

Note

  • Let the column matrix be \(\mathbf{C_0}=[]\)

  • For \(i=1\) to \(r\):

    • Select column \(\mathbf{a}_i\) if \(\mathbf{a}_i\notin\text{span}(C_i)\)

    • Update \(\mathbf{C_i}=\begin{bmatrix}\mathbf{C_{i-1}}\\ \mathbf{a}_i\end{bmatrix}\)

  • To find \(R\):

    • For the columns of \(\mathbf{A}\) that are already in \(\mathbf{C}\), the row would have a 1 to select that column and 0 everywhere else.

    • For the dependent columns, we put the right coefficients which recreates the column from others above it.

Attention

  • The column vectors in \(\mathbf{C}\) create one of the basis for \(C(\mathbf{A})\).

Tip

  • If the matrix is made of data, then this is desirable as it preserves the original columns.

  • A similar factorisation can also be achieved using original rows as well, \(\mathbf{A}=\mathbf{C}\mathbf{M}\mathbf{R}\) where \(\mathbf{R}\) consists of indepoendent row-vectors and \(\mathbf{M}_{r\times r}\) is a mixing matrix.

LU Factorisation

Note

  • Applicable for square matrices where the matrix can be thought of as \(n\) equations with \(n\) unknowns.

  • It is an iterative process where every step we peel off the first column and first row of the remaining matrix until we reach a \(1\times 1\) matrix.

    \[\begin{split}\mathbf{A}=\begin{bmatrix}a_{1,1}&a_{1,2}&\dots&a_{1,n}\\a_{2,1}&a_{2,2}&\dots&a_{2,n}\\\vdots&\vdots&\dots&\vdots\\a_{n,1}&a_{n,2}&\dots&a_{n,n}\end{bmatrix}=\begin{bmatrix}1\\\frac{a_{2,1}}{a_{1,1}}\\\vdots\\\frac{a_{n,1}}{a_{1,1}}\end{bmatrix}\begin{bmatrix}a_{1,1}&a_{1,2}&\dots&a_{1,n}\end{bmatrix}+\begin{bmatrix}0 & 0 & \dots & 0\\0 & \left(a_{2,2}-\frac{a_{2,1}a_{1,2}}{a_{1,1}}\right) & \dots & \left(a_{2,n}-\frac{a_{2,1}a_{1,n}}{a_{1,1}}\right)\\\vdots&\vdots&\dots&\vdots\\0 & \left(a_{n,2}-\frac{a_{n,1}a_{1,2}}{a_{1,1}}\right) & \dots & \left(a_{n,n}-\frac{a_{n,1}a_{1,n}}{a_{1,1}}\right)\end{bmatrix}=\mathbf{l}_1\mathbf{u}^*_1+\begin{bmatrix}0 & \dots\\\vdots & \mathbf{A}_2\end{bmatrix}\end{split}\]
  • At the end of the process, we’re left with the sum of all rank 1 matrices

    \[\begin{split}\mathbf{A}=\mathbf{l}_1\mathbf{u}^*_1+\cdots+\mathbf{l}_n\mathbf{u}^*_n=\begin{bmatrix}|&\cdots&|\\\mathbf{l}_1&\cdots&\mathbf{l}_n\\|&\cdots&|\end{bmatrix}\begin{bmatrix}-&\mathbf{u}^*_1&-\\\vdots&\vdots&\vdots\\-&\mathbf{u}^*_n&-\end{bmatrix}=\mathbf{L}\mathbf{U}\end{split}\]
  • The matrix \(\mathbf{L}\) is lower triangular with \(1s\) in its diagonal, where as the matrix \(\mathbf{U}\) is upper triangular.

LDLT Factorisation

Note

  • For symmetric matrices \(\mathbf{A}\), the \(\mathbf{U}\) factor can be expressed in terms of \(\mathbf{L}\) after pulling out the diagonal elements to make it all \(1s\).

    \[\mathbf{A}=\mathbf{L}\mathbf{D}\mathbf{L}^\top\]

Cholesky Factorisation

Note

  • We can take square root of the diagonal and push inside the \(\mathbf{L}\) to obtain factorisation in the form

    \[\mathbf{A}=\mathbf{L}\mathbf{L}^\top\]

Gram-Schmidt Orgthogonalisation

Eigendecomposition

Note

  • When we’re dealing with square matrices \(\mathbf{A}_{n\times n}\), we can think of the matrix transforming the input space by rotating or stretching or flipping every vector in the entire space.

  • Under such a linear transform, there are certain directions which stay fixed (they don’t rotate). They just get stretched or flipped.

  • For any vector along these directions, the effect of matrix multiplication is just the same as scalar multiplication which stretches/flips the vectors.

    \[\mathbf{A}\mathbf{x}=\lambda\mathbf{x}\]
    • \(\lambda\) is a eigenvalue and \(\mathbf{x}\) is a eigenvector of \(\mathbf{A}\).

    • These are determined entirely by the matrix of the linear transform \(\mathbf{A}\).

  • If all such vectors are collected in a matrix, then

    \[\begin{split}\mathbf{A}\mathbf{X}=\mathbf{A}\begin{bmatrix}|&|&\cdots&|\\\mathbf{x}_1&\mathbf{x}_2&\cdots&\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}=\begin{bmatrix}|&|&\cdots&|\\\mathbf{A}\mathbf{x}_1&\mathbf{A}\mathbf{x}_2&\cdots&\mathbf{A}\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}=\begin{bmatrix}|&|&\cdots&|\\\lambda_1\mathbf{x}_1&\lambda_2\mathbf{x}_2&\cdots&\lambda_n\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}=\begin{bmatrix}|&|&\cdots&|\\\mathbf{x}_1&\mathbf{x}_2&\cdots&\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}\begin{bmatrix}\lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_n\end{bmatrix}=\mathbf{X}\boldsymbol{\Lambda}\end{split}\]
  • Therefore, the matrix factorises as

    \[\mathbf{A}=\mathbf{A}(\mathbf{X}\mathbf{X}^{-1})=(\mathbf{A}\mathbf{X})\mathbf{X}^{-1}=(\mathbf{X}\boldsymbol{\Lambda})\mathbf{X}^{-1}=\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1}\]

Real and Complex Eigenvalues

Note

  • The eigenvalues can be real or complex.

    • Symmetric matrices have real eigenvalues

    • Orthogonal matrices have complex eigenvalues.

Matrix power

\[\mathbf{A}^n\mathbf{u}=(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})^n\mathbf{u}=(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})\cdots(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})\mathbf{u}=(\mathbf{X}\boldsymbol{\Lambda}^n\mathbf{X}^{-1})\mathbf{u}\]

Attention

  • For the eigenvalues that are real, the vectors get stretched repeatedly (and flipped alternatively, if eigenvalues are negative) along that direction as the effect is the same as multiplication by a real number.

  • For the eigenvalues that are complex, the vectors oscilate as the effect is the same as multiplication by a complex number.

Tip

\(\exp(\mathbf{A})=\mathbf{X}\exp(\boldsymbol{\Lambda})\mathbf{X}^{-1}\)

Trace and Determinant

Note

  • Trace: \(\sum_{i=1}^n\lambda_i\)

  • Determinant: \(\prod_{i=1}^n\lambda_i\)

Similar Matrices

Note

  • The matrix \(\mathbf{A}\) and any other matrix in the form \(\mathbf{M}=\mathbf{B}\mathbf{A}\mathbf{B}^{-1}\) have the same eigenvalues.

  • The eigenvectors corresponding to each such \(\lambda\) is obtained by \(\mathbf{B}\mathbf{x}\) whenever \(\mathbf{A}\mathbf{x}=\lambda\mathbf{x}\)

    \[(\mathbf{B}\mathbf{A}\mathbf{B}^{-1})(\mathbf{B}\mathbf{x})=\mathbf{B}\mathbf{A}(\mathbf{B}^{-1}\mathbf{B})\mathbf{x}=\mathbf{B}\mathbf{A}\mathbf{x}=\lambda\mathbf{B}\mathbf{x}\]
  • So \(\mathbf{A}\) and \(\mathbf{M}\) are called similar matrices.

    • They stretch/flip the vectors in the same fashion, but in a different orientation.

Properties

Warning

  • It is not necessary that the eigenvectors are orthogonal.

    • Eigenvectors are orthogonal \(\iff\mathbf{A}\mathbf{A}^\top=\mathbf{A}^\top\mathbf{A}\)

  • It is not necessary that the eigenvalues are all distinct.

    • All eigenvalues are distinct \(\iff\) the matrix is full rank.

  • Double eigenvalues \(\lambda_i=\lambda_j\) might or might not have independent eigenvectors.

  • In general

    • \(\lambda(\mathbf{A}+\mathbf{B})\neq\lambda(\mathbf{A})+\lambda(\mathbf{B})\)

    • \(\lambda(\mathbf{A}\mathbf{B})\neq\lambda(\mathbf{A})\cdot\lambda(\mathbf{B})\)

Tip

For \(\mathbf{B}=\mathbf{A}-a\cdot\mathbf{I}\), \(\lambda(\mathbf{B})=\lambda(\mathbf{A})-a\)

Special case: Symmetric Real Matrices

Note

  • For real symmetric matrices \(\mathbf{S}\)

    • The eigenvalues are all real

      • Proof Hint: Multiply with complex conjugate of eigenvectors.

        • Let \(\bar{\mathbf{x}}=\begin{bmatrix}\bar{x_1}\\\vdots\\\bar{x_n}\end{bmatrix}=\begin{bmatrix}a_1-ib_1\\\vdots\\a_n-ib_n\end{bmatrix}\) be the complex conjugate of the eigenvector \(\mathbf{x}=\begin{bmatrix}x_1\\\vdots\\x_n\end{bmatrix}=\begin{bmatrix}a_1+ib_1\\\vdots\\a_n+ib_n\end{bmatrix}\in\mathbb{C}^n\).

        • We have \(\bar{\mathbf{x}}^\top\mathbf{S}\mathbf{x}=\lambda\bar{\mathbf{x}}^\top\mathbf{x}\)

        • From RHS: \(\sum_{i=1}^n\bar{x_i}x_i=\sum_{i=1}^n a_i^2+b_i^2\), all real.

        • The LHS: \(S_{1,1}(\bar{x_1}x_1)+S_{1,2}(\bar{x_1}x_2+\bar{x_2}x_1)+\cdots\).

        • Terms of the form \(S_{i,i}(\bar{x_i}x_i)\) are all real.

        • Terms of the form \(S_{i,j}(\bar{x_i}x_j+\bar{x_j}x_i)=S_{i,j}\left((a_i-ib_i)(a_i+ib_i)+(a_i+ib_i)(a_i-ib_i)\right)\) which is also real.

        • Therefore, \(\lambda\) must be real.

    • The eigenvectors are orthogonal

      • Proof Hint: Involve null-space and utilise the fact that for symmetric matrices, row-space and column-space are the same.

        • For some \(i\neq j\), let \(\lambda_i\) and \(\lambda_j\) be two eigenvalues with corresponding eigenvectors \(\mathbf{x}_i\) and \(\mathbf{x}_j\).

        • We have \((\mathbf{S}-\lambda_i\mathbf{I})\mathbf{x}_i=\mathbf{0}\). Therefore

          \[\mathbf{x}_i\in N(\mathbf{S}-\lambda_i\mathbf{I})\]
        • We also have \((\mathbf{S}-\lambda_i\mathbf{I})\mathbf{x}_j=(\lambda_j-\lambda_i)\mathbf{x}_j\). Therefore

          \[\mathbf{x}_j\in C(\mathbf{S}-\lambda_i\mathbf{I})=C((\mathbf{S}-\lambda_i\mathbf{I})^\top)\]
        • Therefore, \(\mathbf{x}_i\mathop{\bot}\mathbf{x}_j\) for \(i\neq j\).

Tip

  • We usually write \(\mathbf{S}=\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top\)

  • Every matrix in this form is symmetric

    \[\mathbf{S}^\top=(\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top)^\top=(\mathbf{Q}^\top)^\top\boldsymbol{\Lambda}^\top\mathbf{Q}^\top=\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top=\mathbf{S}\]

Positive Definite Matrices

Multiplication by a pd matrix is similar to multiplying by a positive real number.

Tests for positive definiteness

Note

  • All eigenvalues are positive.

  • Quadratic Form: For any vector \(\mathbf{x}\neq\mathbf{0}\), \(\mathbf{x}^\top\mathbf{S}\mathbf{x} > 0\).

  • The matrix \(\mathbf{S}\) can be factorised as \(\mathbf{S}=\mathbf{A}^\top\mathbf{A}\).

    • Choices for \(\mathbf{A}\) can be

      • \(\mathbf{S}=\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top=(\sqrt{\boldsymbol{\Lambda}}\mathbf{Q}^\top)^\top(\sqrt{\boldsymbol{\Lambda}}\mathbf{Q}^\top)=\mathbf{A}^\top\mathbf{A}\)

      • \(\mathbf{S}=\mathbf{L}\mathbf{L}^\top\)

  • The leading determinants \(D_1,D_2,\cdots,D_n\) are all positive.

  • In LU elimination, the pivot elements are all positive.

Positive Semi-definite Matrices

Note

  • All eigenvalues are \(\geq 0\)

  • Quadratic Form: For any vector \(\mathbf{x}\neq\mathbf{0}\), \(\mathbf{x}^\top\mathbf{S}\mathbf{x} \geq 0\).

Singular Value Decomposition

Comparison with Eigen stuff

Eigenvalues/vectors

Singular values/vectors

Works with \(\mathbf{A}:\mathbb{R}^n\mapsto\mathbb{R}^n\)

Works with \(\mathbf{A}:\mathbb{R}^n\mapsto\mathbb{R}^m\)

Eigenvalues can be real or complex.

Singular values are non-negative real numbers.

Eigenvectors are not always to be orthogonal to one another.

Singular vectors are orthonormal.

Tip

  • Eigen decompositon finds directions in the \(\mathbb{R}^n\) (input=output) space that are invariant under the operator. Along those directions the operator acts the same as a scalar multiplier.

  • Singular value decomposition finds directions in the input space \(\mathbb{R}^n\) and a different set of directions in the output space \(\mathbb{R}^m\) such that the operator produces a scaled version of these output vectors when applied to any vector in those input directions.

Note

  • Let \(\mathbf{v}\in\mathbb{R}^n\) a singular vector in the input dimension, \(\mathbf{u}\in\mathbb{R}^m\) a singular vector in the output dimension, and \(\sigma\) be the singular value for the matrix \(\mathbf{A}\). Then

    \[\mathbf{A}\mathbf{v}=\sigma\mathbf{u}\]
  • If all such vectors are collected in a matrix, then

    \[\mathbf{A}\mathbf{V}=\mathbf{U}\boldsymbol{\Sigma}\implies\mathbf{A}=\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top\]
  • We note that \(\mathbf{V}\) and \(\mathbf{U}\) are not square, not are matrices with orthonormal columns while \(\mathbf{\Sigma}\) is a square, diagonal matrix containing all the singular values.

Computing singular values and vectors

We formulate SVD via Eigendecomposition.

Note

  • Let \(\mathbf{M}=\mathbf{A}^\top\mathbf{A}\) and \(\mathbf{N}=\mathbf{A}\mathbf{A}^\top\).

    • We note that \(\mathbf{M}\) and \(\mathbf{N}\) are symetric, hence have real eigenvalues and orthonormal eigenvectors.

      \[\begin{split}\mathbf{M}^\top=(\mathbf{A}^\top\mathbf{A})^\top=\mathbf{A}^\top\mathbf{A}=\mathbf{M}\\\mathbf{N}^\top=(\mathbf{A}\mathbf{A}^\top)^\top=\mathbf{A}\mathbf{A}^\top=\mathbf{N}\end{split}\]
  • Reformualtion in terms of eigen decomposition

    • \(\mathbf{M}=\mathbf{A}^\top\mathbf{A}=(\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top)^\top\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top=\mathbf{V}\boldsymbol{\Sigma}^\top(\mathbf{U}^\top\mathbf{U})\boldsymbol{\Sigma}\mathbf{V}^\top=\mathbf{V}\boldsymbol{\Sigma}^2\mathbf{V}^\top\)

    • Similarly, \(\mathbf{N}=\mathbf{U}\boldsymbol{\Sigma}^2\mathbf{U}^\top\)

  • From the 2 eigen decompositions we obtain

    • Singular values: the diagonal matrix \(\boldsymbol{\Sigma}=\sqrt{\boldsymbol{\Lambda}}\) and

    • Singular vectors: the set of vectors \(\mathbf{U}\) and \(\mathbf{V}\).

  • Since each \(\sigma=\sqrt{\lambda}\), we need to ensure that \(\lambda\geq 0\).

    • \(\mathbf{x}^\top\mathbf{M}\mathbf{x}=\mathbf{x}^\top\mathbf{A}^\top\mathbf{A}\mathbf{x}=||\mathbf{A}\mathbf{x}||\geq 0\)

    • \(\mathbf{y}^\top\mathbf{N}\mathbf{y}=\mathbf{y}^\top\mathbf{A}\mathbf{A}^\top\mathbf{y}=||\mathbf{A}^\top\mathbf{y}||\geq 0\)

    • Therefore, \(\mathbf{M}\) and \(\mathbf{N}\) are positive semi definite, so \(\sqrt{\lambda}\) is real.

Tip

  • Singular values are arranged in descending order, \(\sigma_1\geq\sigma_2\geq\cdots\sigma_r\geq 0=\cdots 0\), where \(r\) is the rank of the matrix.

  • Any linear transformation \(\mathbf{A}=\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^T\) can be thought of as a

    • pure rotation/flip

    • stretching

    • another pure rotation/flip

  • All the stretching happens due to \(\boldsymbol{\Sigma}\).

  • For any vector \(\mathbf{x}\), \(||\mathbf{A}\mathbf{x}||\leq \sigma_1||\mathbf{x}||\) where \(\sigma_1\) is the first singular value.

    • Proof Hint:

      • We have \(||\mathbf{A}\mathbf{x}||=||\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^T\mathbf{x}||\)

      • Since \(\mathbf{U}\) and \(\mathbf{V}\) are matrices with orthonormal columns, they don’t change the length.

      • Therefore, \(||\mathbf{A}\mathbf{x}||=||\boldsymbol{\Sigma}\mathbf{x}||=\sigma_1 x_1+\cdots+\sigma_n x_n\leq \sigma_1(x_1+\cdots+x_n)=\sigma_1||\mathbf{x}||\)

Attention

  • If \(A\) is a symmetric positive definite matrix, then its SVD is given by its eigen decomposition.

  • If \(A\) is a matrix with orthonormal columns, then all its singular values are 1.

Eckart-Young: Best rank k approximation

Attention

  • Let \(\mathbf{A}_k=\sigma_1\mathbf{u}_1\mathbf{v}_1^\top+\cdots+\sigma_k\mathbf{u}_k\mathbf{v}_k^\top\) for some \(k\leq r\).

  • Let \(\mathbf{B}\) be any rank \(k\) matrix.

  • We have \(||\mathbf{A}-\mathbf{A}_k||\leq ||\mathbf{A}-\mathbf{B}||\)