################################################################################
Matrix Factorisation
################################################################################

********************************************************************************
A=CR
********************************************************************************
This factorisation keeps the columns of the original matrix intact.

.. note::
	* Let the column matrix be :math:`\mathbf{C_0}=[]`
	* For :math:`i=1` to :math:`r`:

		* Select column :math:`\mathbf{a}_i` if :math:`\mathbf{a}_i\notin\text{span}(C_i)`
		* Update :math:`\mathbf{C_i}=\begin{bmatrix}\mathbf{C_{i-1}}\\ \mathbf{a}_i\end{bmatrix}`
	* To find :math:`R`:

		* For the columns of :math:`\mathbf{A}` that are already in :math:`\mathbf{C}`, the row would have a 1 to select that column and 0 everywhere else.
		* For the dependent columns, we put the right coefficients which recreates the column from others above it.

.. attention::
	* The column vectors in :math:`\mathbf{C}` create one of the basis for :math:`C(\mathbf{A})`.

.. tip::
	* If the matrix is made of data, then this is desirable as it preserves the original columns.
	* A similar factorisation can also be achieved using original rows as well, :math:`\mathbf{A}=\mathbf{C}\mathbf{M}\mathbf{R}` where :math:`\mathbf{R}` consists of indepoendent row-vectors and :math:`\mathbf{M}_{r\times r}` is a mixing matrix.

********************************************************************************
LU Factorisation
********************************************************************************
.. note::
	* Applicable for square matrices where the matrix can be thought of as :math:`n` equations with :math:`n` unknowns.
	* It is an iterative process where every step we peel off the first column and first row of the remaining matrix until we reach a :math:`1\times 1` matrix.

		.. math:: \mathbf{A}=\begin{bmatrix}a_{1,1}&a_{1,2}&\dots&a_{1,n}\\a_{2,1}&a_{2,2}&\dots&a_{2,n}\\\vdots&\vdots&\dots&\vdots\\a_{n,1}&a_{n,2}&\dots&a_{n,n}\end{bmatrix}=\begin{bmatrix}1\\\frac{a_{2,1}}{a_{1,1}}\\\vdots\\\frac{a_{n,1}}{a_{1,1}}\end{bmatrix}\begin{bmatrix}a_{1,1}&a_{1,2}&\dots&a_{1,n}\end{bmatrix}+\begin{bmatrix}0 & 0 & \dots & 0\\0 & \left(a_{2,2}-\frac{a_{2,1}a_{1,2}}{a_{1,1}}\right) & \dots & \left(a_{2,n}-\frac{a_{2,1}a_{1,n}}{a_{1,1}}\right)\\\vdots&\vdots&\dots&\vdots\\0 & \left(a_{n,2}-\frac{a_{n,1}a_{1,2}}{a_{1,1}}\right) & \dots & \left(a_{n,n}-\frac{a_{n,1}a_{1,n}}{a_{1,1}}\right)\end{bmatrix}=\mathbf{l}_1\mathbf{u}^*_1+\begin{bmatrix}0 & \dots\\\vdots & \mathbf{A}_2\end{bmatrix}
	* At the end of the process, we're left with the sum of all rank 1 matrices

		.. math:: \mathbf{A}=\mathbf{l}_1\mathbf{u}^*_1+\cdots+\mathbf{l}_n\mathbf{u}^*_n=\begin{bmatrix}|&\cdots&|\\\mathbf{l}_1&\cdots&\mathbf{l}_n\\|&\cdots&|\end{bmatrix}\begin{bmatrix}-&\mathbf{u}^*_1&-\\\vdots&\vdots&\vdots\\-&\mathbf{u}^*_n&-\end{bmatrix}=\mathbf{L}\mathbf{U}
	* The matrix :math:`\mathbf{L}` is lower triangular with :math:`1s` in its diagonal, where as the matrix :math:`\mathbf{U}` is upper triangular.

********************************************************************************
LDLT Factorisation
********************************************************************************
.. note::
	* For symmetric matrices :math:`\mathbf{A}`, the :math:`\mathbf{U}` factor can be expressed in terms of :math:`\mathbf{L}` after pulling out the diagonal elements to make it all :math:`1s`.

		.. math:: \mathbf{A}=\mathbf{L}\mathbf{D}\mathbf{L}^\top

********************************************************************************
Cholesky Factorisation
********************************************************************************
.. note::
	* We can take square root of the diagonal and push inside the :math:`\mathbf{L}` to obtain factorisation in the form

		.. math:: \mathbf{A}=\mathbf{L}\mathbf{L}^\top

********************************************************************************
Gram-Schmidt Orgthogonalisation
********************************************************************************

********************************************************************************
Eigendecomposition
********************************************************************************
.. note::
	* When we're dealing with square matrices :math:`\mathbf{A}_{n\times n}`, we can think of the matrix transforming the input space by rotating or stretching or flipping every vector in the entire space.
	* Under such a linear transform, there are certain directions which stay fixed (they don't rotate). They just get stretched or flipped.
	* For any vector along these directions, the effect of matrix multiplication is just the same as scalar multiplication which stretches/flips the vectors.

		.. math:: \mathbf{A}\mathbf{x}=\lambda\mathbf{x}

		* :math:`\lambda` is a eigenvalue and :math:`\mathbf{x}` is a eigenvector of :math:`\mathbf{A}`.
		* These are determined entirely by the matrix of the linear transform :math:`\mathbf{A}`.
	* If all such vectors are collected in a matrix, then

		.. math:: \mathbf{A}\mathbf{X}=\mathbf{A}\begin{bmatrix}|&|&\cdots&|\\\mathbf{x}_1&\mathbf{x}_2&\cdots&\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}=\begin{bmatrix}|&|&\cdots&|\\\mathbf{A}\mathbf{x}_1&\mathbf{A}\mathbf{x}_2&\cdots&\mathbf{A}\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}=\begin{bmatrix}|&|&\cdots&|\\\lambda_1\mathbf{x}_1&\lambda_2\mathbf{x}_2&\cdots&\lambda_n\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}=\begin{bmatrix}|&|&\cdots&|\\\mathbf{x}_1&\mathbf{x}_2&\cdots&\mathbf{x}_n\\|&|&\cdots&|\end{bmatrix}\begin{bmatrix}\lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_n\end{bmatrix}=\mathbf{X}\boldsymbol{\Lambda}

	* Therefore, the matrix factorises as 

		.. math:: \mathbf{A}=\mathbf{A}(\mathbf{X}\mathbf{X}^{-1})=(\mathbf{A}\mathbf{X})\mathbf{X}^{-1}=(\mathbf{X}\boldsymbol{\Lambda})\mathbf{X}^{-1}=\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1}

Real and Complex Eigenvalues
================================================================================
.. note::
	* The eigenvalues can be real or complex.
		
		* **Symmetric matrices have real eigenvalues** 
		* **Orthogonal matrices have complex eigenvalues**.

Matrix power
================================================================================
	.. math:: \mathbf{A}^n\mathbf{u}=(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})^n\mathbf{u}=(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})\cdots(\mathbf{X}\boldsymbol{\Lambda}\mathbf{X}^{-1})\mathbf{u}=(\mathbf{X}\boldsymbol{\Lambda}^n\mathbf{X}^{-1})\mathbf{u}

.. attention::
	* For the eigenvalues that are real, the vectors get stretched repeatedly (and flipped alternatively, if eigenvalues are negative) along that direction as the effect is the same as multiplication by a real number.
	* For the eigenvalues that are complex, the vectors oscilate as the effect is the same as multiplication by a complex number.

.. tip::
	:math:`\exp(\mathbf{A})=\mathbf{X}\exp(\boldsymbol{\Lambda})\mathbf{X}^{-1}`

Trace and Determinant
================================================================================
.. note::
	* **Trace**: :math:`\sum_{i=1}^n\lambda_i`
	* **Determinant**: :math:`\prod_{i=1}^n\lambda_i`

Similar Matrices
================================================================================
.. note::
	* The matrix :math:`\mathbf{A}` and any other matrix in the form :math:`\mathbf{M}=\mathbf{B}\mathbf{A}\mathbf{B}^{-1}` have the same eigenvalues.
	* The eigenvectors corresponding to each such :math:`\lambda` is obtained by :math:`\mathbf{B}\mathbf{x}` whenever :math:`\mathbf{A}\mathbf{x}=\lambda\mathbf{x}`

		.. math:: (\mathbf{B}\mathbf{A}\mathbf{B}^{-1})(\mathbf{B}\mathbf{x})=\mathbf{B}\mathbf{A}(\mathbf{B}^{-1}\mathbf{B})\mathbf{x}=\mathbf{B}\mathbf{A}\mathbf{x}=\lambda\mathbf{B}\mathbf{x}
	* So :math:`\mathbf{A}` and :math:`\mathbf{M}` are called **similar matrices**.

		* They stretch/flip the vectors in the same fashion, but in a different orientation.

Properties
================================================================================
.. warning::
	* It is not necessary that the eigenvectors are orthogonal.

		* Eigenvectors are orthogonal :math:`\iff\mathbf{A}\mathbf{A}^\top=\mathbf{A}^\top\mathbf{A}`
	* It is not necessary that the eigenvalues are all distinct.

		* All eigenvalues are distinct :math:`\iff` the matrix is full rank.
	* Double eigenvalues :math:`\lambda_i=\lambda_j` might or might not have independent eigenvectors.
	* **In general**

		* :math:`\lambda(\mathbf{A}+\mathbf{B})\neq\lambda(\mathbf{A})+\lambda(\mathbf{B})`
		* :math:`\lambda(\mathbf{A}\mathbf{B})\neq\lambda(\mathbf{A})\cdot\lambda(\mathbf{B})`

.. tip::
	For :math:`\mathbf{B}=\mathbf{A}-a\cdot\mathbf{I}`, :math:`\lambda(\mathbf{B})=\lambda(\mathbf{A})-a`

Special case: Symmetric Real Matrices
================================================================================
.. note::
	* For real symmetric matrices :math:`\mathbf{S}`

		* The eigenvalues are all real
			
			* Proof Hint: Multiply with complex conjugate of eigenvectors.

				* Let :math:`\bar{\mathbf{x}}=\begin{bmatrix}\bar{x_1}\\\vdots\\\bar{x_n}\end{bmatrix}=\begin{bmatrix}a_1-ib_1\\\vdots\\a_n-ib_n\end{bmatrix}` be the complex conjugate of the eigenvector :math:`\mathbf{x}=\begin{bmatrix}x_1\\\vdots\\x_n\end{bmatrix}=\begin{bmatrix}a_1+ib_1\\\vdots\\a_n+ib_n\end{bmatrix}\in\mathbb{C}^n`.
				* We have :math:`\bar{\mathbf{x}}^\top\mathbf{S}\mathbf{x}=\lambda\bar{\mathbf{x}}^\top\mathbf{x}`
				* From RHS: :math:`\sum_{i=1}^n\bar{x_i}x_i=\sum_{i=1}^n a_i^2+b_i^2`, all real.
				* The LHS: :math:`S_{1,1}(\bar{x_1}x_1)+S_{1,2}(\bar{x_1}x_2+\bar{x_2}x_1)+\cdots`.
				* Terms of the form :math:`S_{i,i}(\bar{x_i}x_i)` are all real.
				* Terms of the form :math:`S_{i,j}(\bar{x_i}x_j+\bar{x_j}x_i)=S_{i,j}\left((a_i-ib_i)(a_i+ib_i)+(a_i+ib_i)(a_i-ib_i)\right)` which is also real.
				* Therefore, :math:`\lambda` must be real.
		* The eigenvectors are orthogonal

			* Proof Hint: Involve null-space and utilise the fact that for symmetric matrices, row-space and column-space are the same.

				* For some :math:`i\neq j`, let :math:`\lambda_i` and :math:`\lambda_j` be two eigenvalues with corresponding eigenvectors :math:`\mathbf{x}_i` and :math:`\mathbf{x}_j`.
				* We have :math:`(\mathbf{S}-\lambda_i\mathbf{I})\mathbf{x}_i=\mathbf{0}`. Therefore

					.. math:: \mathbf{x}_i\in N(\mathbf{S}-\lambda_i\mathbf{I})
				* We also have :math:`(\mathbf{S}-\lambda_i\mathbf{I})\mathbf{x}_j=(\lambda_j-\lambda_i)\mathbf{x}_j`. Therefore

					.. math:: \mathbf{x}_j\in C(\mathbf{S}-\lambda_i\mathbf{I})=C((\mathbf{S}-\lambda_i\mathbf{I})^\top)
				* Therefore, :math:`\mathbf{x}_i\mathop{\bot}\mathbf{x}_j` for :math:`i\neq j`.

.. tip::
	* We usually write :math:`\mathbf{S}=\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top`
	* Every matrix in this form is symmetric

		.. math:: \mathbf{S}^\top=(\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top)^\top=(\mathbf{Q}^\top)^\top\boldsymbol{\Lambda}^\top\mathbf{Q}^\top=\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top=\mathbf{S}

Positive Definite Matrices
--------------------------------------------------------------------------------
Multiplication by a pd matrix is similar to multiplying by a positive real number.

Tests for positive definiteness
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. note::
	* All eigenvalues are positive.
	* **Quadratic Form**: For any vector :math:`\mathbf{x}\neq\mathbf{0}`, :math:`\mathbf{x}^\top\mathbf{S}\mathbf{x} > 0`.
	* The matrix :math:`\mathbf{S}` can be factorised as :math:`\mathbf{S}=\mathbf{A}^\top\mathbf{A}`.

		* Choices for :math:`\mathbf{A}` can be

			* :math:`\mathbf{S}=\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^\top=(\sqrt{\boldsymbol{\Lambda}}\mathbf{Q}^\top)^\top(\sqrt{\boldsymbol{\Lambda}}\mathbf{Q}^\top)=\mathbf{A}^\top\mathbf{A}`
			* :math:`\mathbf{S}=\mathbf{L}\mathbf{L}^\top`
	* The leading determinants :math:`D_1,D_2,\cdots,D_n` are all positive.
	* In LU elimination, the pivot elements are all positive.

Positive Semi-definite Matrices
--------------------------------------------------------------------------------
.. note::
	* All eigenvalues are :math:`\geq 0`
	* **Quadratic Form**: For any vector :math:`\mathbf{x}\neq\mathbf{0}`, :math:`\mathbf{x}^\top\mathbf{S}\mathbf{x} \geq 0`.

********************************************************************************
Singular Value Decomposition
********************************************************************************
.. csv-table:: Comparison with Eigen stuff
	:header: "Eigenvalues/vectors", "Singular values/vectors"
	:align: center
	:widths: 40, 40
	:class: longtable

	Works with :math:`\mathbf{A}:\mathbb{R}^n\mapsto\mathbb{R}^n`, Works with :math:`\mathbf{A}:\mathbb{R}^n\mapsto\mathbb{R}^m`	
	Eigenvalues can be real or complex., Singular values are non-negative real numbers.
	Eigenvectors are not always to be orthogonal to one another., Singular vectors are orthonormal.

.. tip::
	* Eigen decompositon finds directions in the :math:`\mathbb{R}^n` (input=output) space that are invariant under the operator. Along those directions the operator acts the same as a scalar multiplier.
	* Singular value decomposition finds directions in the input space :math:`\mathbb{R}^n` and a different set of directions in the output space :math:`\mathbb{R}^m` such that the operator produces a scaled version of these output vectors when applied to any vector in those input directions.

.. note::
	* Let :math:`\mathbf{v}\in\mathbb{R}^n` a singular vector in the input dimension, :math:`\mathbf{u}\in\mathbb{R}^m` a singular vector in the output dimension, and :math:`\sigma` be the singular value for the matrix :math:`\mathbf{A}`. Then

		.. math:: \mathbf{A}\mathbf{v}=\sigma\mathbf{u}
	* If all such vectors are collected in a matrix, then

		.. math:: \mathbf{A}\mathbf{V}=\mathbf{U}\boldsymbol{\Sigma}\implies\mathbf{A}=\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top
	* We note that :math:`\mathbf{V}` and :math:`\mathbf{U}` are not square, not are matrices with orthonormal columns while :math:`\mathbf{\Sigma}` is a square, diagonal matrix containing all the singular values.

Computing singular values and vectors
================================================================================
We formulate SVD via Eigendecomposition.

.. note::
	* Let :math:`\mathbf{M}=\mathbf{A}^\top\mathbf{A}` and :math:`\mathbf{N}=\mathbf{A}\mathbf{A}^\top`.

		* We note that :math:`\mathbf{M}` and :math:`\mathbf{N}` are symetric, hence have **real eigenvalues** and **orthonormal eigenvectors**.

			.. math:: \mathbf{M}^\top=(\mathbf{A}^\top\mathbf{A})^\top=\mathbf{A}^\top\mathbf{A}=\mathbf{M}\\\mathbf{N}^\top=(\mathbf{A}\mathbf{A}^\top)^\top=\mathbf{A}\mathbf{A}^\top=\mathbf{N}
	* Reformualtion in terms of eigen decomposition

		* :math:`\mathbf{M}=\mathbf{A}^\top\mathbf{A}=(\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top)^\top\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top=\mathbf{V}\boldsymbol{\Sigma}^\top(\mathbf{U}^\top\mathbf{U})\boldsymbol{\Sigma}\mathbf{V}^\top=\mathbf{V}\boldsymbol{\Sigma}^2\mathbf{V}^\top`
		* Similarly, :math:`\mathbf{N}=\mathbf{U}\boldsymbol{\Sigma}^2\mathbf{U}^\top`
	* From the 2 eigen decompositions we obtain 
	
		* **Singular values**: the diagonal matrix :math:`\boldsymbol{\Sigma}=\sqrt{\boldsymbol{\Lambda}}` and 
		* **Singular vectors**: the set of vectors :math:`\mathbf{U}` and :math:`\mathbf{V}`.
	* Since each :math:`\sigma=\sqrt{\lambda}`, we need to ensure that :math:`\lambda\geq 0`.

		* :math:`\mathbf{x}^\top\mathbf{M}\mathbf{x}=\mathbf{x}^\top\mathbf{A}^\top\mathbf{A}\mathbf{x}=||\mathbf{A}\mathbf{x}||\geq 0`
		* :math:`\mathbf{y}^\top\mathbf{N}\mathbf{y}=\mathbf{y}^\top\mathbf{A}\mathbf{A}^\top\mathbf{y}=||\mathbf{A}^\top\mathbf{y}||\geq 0`
		* Therefore, :math:`\mathbf{M}` and :math:`\mathbf{N}` are positive semi definite, so :math:`\sqrt{\lambda}` is real.

.. tip::
	* Singular values are arranged in descending order, :math:`\sigma_1\geq\sigma_2\geq\cdots\sigma_r\geq 0=\cdots 0`, where :math:`r` is the rank of the matrix.
	* Any linear transformation :math:`\mathbf{A}=\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^T` can be thought of as a 

		* pure rotation/flip
		* stretching
		* another pure rotation/flip

	* All the stretching happens due to :math:`\boldsymbol{\Sigma}`.
	* For any vector :math:`\mathbf{x}`, :math:`||\mathbf{A}\mathbf{x}||\leq \sigma_1||\mathbf{x}||` where :math:`\sigma_1` is the first singular value.

		* Proof Hint:

			* We have :math:`||\mathbf{A}\mathbf{x}||=||\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^T\mathbf{x}||`
			* Since :math:`\mathbf{U}` and :math:`\mathbf{V}` are matrices with orthonormal columns, they don't change the length.
			* Therefore, :math:`||\mathbf{A}\mathbf{x}||=||\boldsymbol{\Sigma}\mathbf{x}||=\sigma_1 x_1+\cdots+\sigma_n x_n\leq \sigma_1(x_1+\cdots+x_n)=\sigma_1||\mathbf{x}||`

.. attention::
	* If :math:`A` is a symmetric positive definite matrix, then its SVD is given by its eigen decomposition.
	* If :math:`A` is a matrix with orthonormal columns, then all its singular values are 1.


Eckart-Young: Best rank k approximation
================================================================================
.. attention::
	* Let :math:`\mathbf{A}_k=\sigma_1\mathbf{u}_1\mathbf{v}_1^\top+\cdots+\sigma_k\mathbf{u}_k\mathbf{v}_k^\top` for some :math:`k\leq r`.
	* Let :math:`\mathbf{B}` be any rank :math:`k` matrix.
	* We have :math:`||\mathbf{A}-\mathbf{A}_k||\leq ||\mathbf{A}-\mathbf{B}||`