Functional Analysis¶
Functions are vectors¶
Intuition¶
Tip
We usually think of vectors in finite dimensional case, e.g. \(\mathbf{u}\in\mathbb{R}^4\), as a list
\[\begin{split}\mathbf{u}=\begin{bmatrix}0.3426 \\1.3258 \\6.8943 \\8.3878 \\\end{bmatrix}\\\end{split}\]An alternate way to look at this is as a list of tuples, which binds the dimension (integer index in this case) and the value of that dimension
\[\mathbf{u}=\left((0,0.3426),(1,1.3258),(2,6.8943),(3,8.387)\right)\]This can be represented by a function
\[u:[0,1,2,3]\mapsto[0.3426,1.3258,6.8943,8.387]\]In general, these sort of functions can be represented by
\[u:\mathcal{I}\mapsto\mathbb{R}\]where \(\mathcal{I}\) is a finite index set.
If we extend \(\mathcal{I}\) to be countably infinite or uncountable (e.g. \(\mathbb{N}\) or \(\mathbb{R}\)), then, we end up with infinite lists of tuples, e.g.
\[f=\{(x,y)|x,y\in\mathbb{R}\}\]This is the way functions are defined in set theory, where the dimension of each function (informally) is \(|\mathcal{X}|\).
Cardinality of Function Space¶
Note
The space of functions \(f:\mathcal{X}\mapsto\mathcal{Y}\) is often denoted by \(\mathcal{Y}^{\mathcal{X}}\).
For every element \(x\in\mathcal{X}\), we have a choice to make its image map to some \(y\in\mathcal{Y}\).
Therefore, the size of this choice for each \(x\in\mathcal{X}\) is \(|\mathcal{Y}|\).
The size of the choice for all the elements in \(\mathcal{X}\) is therefore \(|\mathcal{Y}|^{|\mathcal{X}|}\).
Tip
To remember, we can think of the selection problem, where we have \(n\) items which are either selected or discarded.
We know from combinatorics that the total number of such choices is \(2^n\).
This can be formed as a function where each of the \(n\) elements map to either 0 or 1.
We also verify that function view of real-valued vectors which maps the index set (of size \(d\)) to reals also make sense since we represent the dimension as \(\mathbb{R}^d\).
Technicalities¶
As long as we’re restricting ourselves to the class of functions from one vector (linear) space \(\mathcal{X}\) to another \(\mathcal{Y}\) over the same underlying scalar field \(\mathbb{F}\), we can faithfully recover many useful properties from a finite dimensional vector spaces for the (potentially infinite dimensional) space of functions.
Addition and Scalar Multiplication¶
Note
For finite dimensional vectors \(\mathbf{u},\mathbf{v}\in V_{\mathcal{F}}\), where \(\mathcal{F}\) is the underlying field:
[Vector Addition]: We add the corresponding values for each dimension.
[Scalar Multiplication]: For any \(\alpha\in\mathcal{F}\), we multiply \(\alpha\in\mathcal{F}\) to the value of each dimension.
For functions \(f:\mathcal{X}\mapsto\mathcal{Y}\) and \(g:\mathcal{X}\mapsto\mathcal{Y}\), the dimensions are \(x\in\mathcal{X}\).
[Vector Addition]: We can define vector addition of a function for each “dimension” \(x\)
\[(f + g)(x) = f(x) + g(x)\][Scalar Multiplication]: We can define scalar multiplication for each “dimension” \(x\)
\[(\alpha\cdot f)(x) = \alpha\cdot f(x)\]
Tip
With \(\mathcal{X}\) and \(\mathcal{Y}\) being linear space over the same scalar field, the addition and scalar multiplication makes sense.
Inner Product¶
Note
For finite dimensional vectors \(\mathbf{u},\mathbf{v}\in V_{\mathcal{F}}\), to compute the inner (dot) product
We take a scalar product of the corresponding values for each dimension.
We sum the results across all dimensions.
\[\langle\mathbf{u},\mathbf{v}\rangle=\sum_{i=1}^n u_i\cdot v_i\]
Warning
Let’s add a constraint that \(\mathcal{Y}\) is equipped with an inner product.
Let’s add a constraint that \(\mathcal{X}\) is equipped with a positive measure \(\mu(x)\) and \(\mathop{d\mu}(x)=\mathop{dx}\).
Note
For functions \(f:\mathcal{X}\mapsto\mathcal{Y}\) and \(g:\mathcal{X}\mapsto\mathcal{Y}\)
We can take a scalar product for each dimension \(x\).
Since \(\mathcal{X}\) can be uncountable, we replace the sum with integration
\[\langle f,g\rangle=\int_{\mathcal{X}}f(x)\cdot g(x)\mathop{dx}\]
Tip
With \(\mathcal{Y}\) being an inner product space, dot product under the integral makes sense.
Orthogonality¶
Note
Two functions \(f\) and \(g\) are orthogonal if their inner product is 0.
Example: For real trig functions \(\sin:[0,\pi]\mapsto[0,1]\) and \(\cos:[0,\pi]\mapsto[0,1]\)
\[\langle\sin,\cos\rangle=\int_\limits{0}^{\pi}\sin(x)\cos(x)\mathop{dx}=0\]
Norm - Induced by the Inner Product¶
Lp Space¶
Note
The inner product for finite vectors induces a norm (\(l_2\))
\[||\mathbf{u}||_2^2=\langle \mathbf{u},\mathbf{u}\rangle=\sum_{i=1}^n|u_i|^2\]The inner product defined above induces a norm
\[||f||_2^2=\langle f,f\rangle=\int_{\mathcal{X}}|f(x)|^2\mathop{dx}\]More generally, we can have
\[||f||_{L_p}=\left(\int_{\mathcal{X}}|f(x)|^p\mathop{dx}\right)^{1/p}\]
Tip
For more general measurable spaces where we have a measure \(\mu(x)\) defined
\[||f||_{L_p(\mathcal{X},\mu)}=\left(\int_{\mathcal{X}}|f(x)|^p\mathop{d\mu}(x)\right)^{1/p}\]For \(p=\infty\)
\[||f||_{L_\infty(\mathcal{X},\mu)}=\text{ess}\sup_\limits{x\in\mathcal{X}}|f(x)|\]We write the function space as \(L^p(\mathcal{X},\mathcal{Y})=\{f|f:\mathcal{X}\mapsto\mathcal{Y};\text{such that }L_p(\mathcal{X,\mu})\text{ exists}\}\)
Example: \(L^2([0,1],\mathbb{R})\)
Metric - Induced by the Norm¶
Note
The \(l_p\) norm for finite vectors induces a metric
\[d(\mathbf{u}, \mathbf{v})=||\mathbf{u}-\mathbf{v}||_2=\left(\sum_{i=1}^n|u_i-v_i|^p\right)^{1/p}\]We can define, similarly, for functions
\[d(f, g)=||f-g||_{L_p(\mathcal{X},\mu)}=\left(\int_\limits{i=1}^n|f(x)-g(x)|^p\mathop{d\mu}(x)\right)^{1/p}\]If \(d(f, g)=0\), then the functions are the same “almost everywhere”.
In this case, they are different for at most finitely many “dimensions”.
Topological Properties¶
Note
With a metric defined, we can define topological properties such as convergence and complete function spaces.
Tip
Complete normed spaces are known as Banach Space.
Complete inner product spaces are known as Hilbert Space.
Warning
Without the metric, the only topology we can have for the set of functions is the product topology (as suggested by \(\mathcal{Y}^{\mathcal{X}}\)).
With product topology, the only convergence that we can define is point-wise convergence which is a weak form of convergence.
Point-wise Convergence¶
Note
Let \((f_n)_{n=1}^\infty\) be a sequence of functions where \(f_n:\mathcal{X}\mapsto\mathcal{Y}\).
Let \(f\) be another function \(f:\mathcal{X}\mapsto\mathcal{Y}\)
We say that the sequence is point-wise converging towards \(f\)
\[\lim\limits_{n\to\infty}f_n=f\iff\forall x\in\mathcal{X}, \lim\limits_{n\to\infty}f_n(x)=f(x)\]
Uniform Convergence¶
Note
We can make a stronger convergence criterion for functions which map to a metric space \(\mathcal{Y}\) (i.e., where \(\sup\) makes sense).
We say that the sequence is uninformly converging towards \(f\)
\[\lim\limits_{n\to\infty}f_n=f\iff\lim\limits_{n\to\infty}\sup\{|f_n(x)-f(x)|:x\in\mathcal{X}\}=0\]
Tip
All uniformly convergent functions are point-wise convergent.
The converse is not true.
Linear Transforms¶
Linear Transforms on Euclidean Space¶
Note
We consider an normalized set of basis vectors (i.e. of unit-length but not necessarily orthogonal) in \(\mathbb{R}^n\) for a finite dimensional vector space as
\[\{\mathbf{a}_1,\cdots\mathbf{a}_n\}\]We can find the projection of any vector \(\mathbf{u}\in\mathbb{R}^n\) onto each of the basis
\[\langle\mathbf{a}_i,\mathbf{u}\rangle\]Under the new basis, this gives the i-th co-ordinate for the result vector \(\mathbf{v}\)
\[\begin{split}\begin{bmatrix}0\\\vdots\\v_i\\\vdots\\0\end{bmatrix}=\langle\mathbf{a}_i,\mathbf{u}\rangle\end{split}\]We note that we can collect the basis vectors inside a matrix as rows and express the relation as
\[\begin{split}\mathbf{v}=\begin{bmatrix}-&\mathbf{a_1^*}&-\\ \vdots&\vdots&\vdots\\ -&\mathbf{a_n^*}&-\\\end{bmatrix}\mathbf{u}=\mathbf{A}^T\mathbf{u}\end{split}\]We also note that the final vector can be written as a sum
\[\begin{split}\mathbf{v}=\sum_{i=1}^n\begin{bmatrix}0\\\vdots\\v_i\\\vdots\\0\end{bmatrix}=\sum_{i=1}^n\langle\mathbf{u},\mathbf{a}_i\rangle\end{split}\]
Function view¶
Note
Let \(\mathcal{I}=\{1,\cdots,d\}\) be the index set.
The vector \(\mathbf{u}\) defines a function \(u:\mathcal{I}\mapsto\mathbb{R}\in\mathcal{U}(\mathcal{I})\)
The basis vectors \(\mathbf{a}_k\) are also functions \(a_k:\mathcal{I}\mapsto\mathcal{H}\) where \(\mathcal{H}^{\mathcal{I}}\subseteq\mathbb{R}^d\) is the subspace spanned by the basis.
Then the matrix \(\mathbf{A}\) defines a linear transform \(A:\mathcal{H}\mapsto\mathcal{U}(\mathcal{I})\) where
\[v(i)=(Au)(i)\]
Linear Transforms on Function Space¶
Note
We consider functions \(f:E\mapsto\mathbb{R}\in\mathcal{F}(E)\) (similar to \(\mathcal{U}(\mathcal{I})\)) where \(E\) is any abstract space.
We consider basis functions \(h:E\mapsto\mathcal{H}\) where \(\mathcal{H}\) is a Hilbert space equipped with a norm
\[\langle \cdot,\cdot\rangle_{\mathcal{H}}\]For any \(x\in E\), the projection onto a basis is given by
\[f(x)=\langle f,h(x)\rangle_{\mathcal{H}}\]This defines a linear transform \(L:\mathcal{H}\mapsto\mathcal{F}(E)\) where
\[f(\cdot)=(Lf)(\cdot)\]
Kernel of a Linear Transform¶
Note
The kernel of a linear transform is found by inner products of the basis functions.
For finite dimensional case \(K:\mathcal{I}\times\mathcal{I}\mapsto\mathbb{R}\)
\[K(i,j)=\langle\mathbf{a}^*_i,\mathbf{a}^*_j\rangle\]For the functional case \(K:E\times E\mapsto\mathbb{R}\)
\[K(x,y)=\langle h(y),h(x)\rangle_{\mathcal{H}}\]For any fixed \(y_n\in E\), \(K(x,y_n)\) is a function of just \(x\in E\) in \(\mathcal{F}(E)\)
Therefore, often the basis functions are referred by just the kernel itself as
\[K(\cdot,x)=h(x)\]and
\[K(x,y)=\langle K(\cdot,x),K(\cdot,y)\rangle\]Symmetry: \(K(x,y)=K(y,x)\) due to the symmetry of the inner product.
Reproducing Property:
Function evaluation is inner product: \(f(x)=\langle f,K(\cdot,x)\rangle_{\mathcal{H}}\)
Therefore, knowing the kernel, we can recover any function.
Any function \(g\in\mathcal{F}(E)\) can be expressed as a linear combination of the basis
\[g(x)=\sum_m\alpha_m K(x,y_m)\]
Tip
We note that when the matrix is the centered, normalised data matrix, then the kernel gives the sample covariance matrix.
This hints as the usability of functional kernels for covariance functions for infinite dimensional Gaussian distributions (GPs).
Moore-Aronszajn Theorem¶
Warning
We assume that the kernel is positive definite.
Then there exists a unique \(\mathcal{H}_k\) for which \(K\) is the kernel basis.
Mercer Theorem - Eigenfunctions¶
Note
TODO
Other Integral Transforms¶
General Form¶
Note
TODO
Fourier Basis¶
Note
For “well-behaved” (i.e. square-integrable so that one can define \(L_2\) norm as per above) periodic functions, we can have basis functions of odd and even frequencies.
Schauder basis (allows for infinite sum over basis):
A basis for functions in \(L^2([0,1],\mathbb{R})\) can be defined in terms of an infinite set of orthonormal functions`
\[\{1, (\sqrt{2}\sin(2\pi nx))_{n=1}^\infty, (\sqrt{2}\cos(2\pi nx))_{n=1}^\infty\}\]The \(\sin\) functions account for odd-frequencies and the \(\cos\) functions account for even-frequencies.
Here we have 3 sets of basis functions, so we use 3 different kinds of normalised-projection co-efficients, \(a_0,a_i,b_i\)
\[f(x)=a_0\cdot1+\sum_{n=1}^\infty a_i\cdot\cos(2\pi nx)+\sum_{n=1}^\infty b_i\cdot\sin(2\pi nx)\]\(a_0\) computes the projection of \(f(x)\) onto the constant function \(1\).
\[a_0=\frac{\int_\limits{[0,1]}1\cdot f(x)\mathop{dx}}{\int_\limits{[0,1]}1\cdot 1\mathop{dx}}=\int_\limits{[0,1]}f(x)\mathop{dx}\]For each \(k>0\), \(a_k\) computes the projection of \(f(x)\) onto the even frequencies, \(\sqrt{2}\cos(2\pi nx)\).
\[a_k=\frac{\int_\limits{[0,1]}f(x)\cdot\sqrt{2}\cos(2\pi kx)\mathop{dx}}{\int_\limits{[0,1]}\sqrt{2}\cos(2\pi kx)\cdot\sqrt{2}\cos(2\pi kx)\mathop{dx}}\]Similarly, for \(b_k\).