################################################################################ Functional Analysis ################################################################################ ******************************************************************************** Functions are vectors ******************************************************************************** Intuition ================================================================================ .. tip:: * We usually think of vectors in finite dimensional case, e.g. :math:`\mathbf{u}\in\mathbb{R}^4`, as a list .. math:: \mathbf{u}=\begin{bmatrix}0.3426 \\1.3258 \\6.8943 \\8.3878 \\\end{bmatrix}\\ * An alternate way to look at this is as a list of tuples, which binds the dimension (integer index in this case) and the value of that dimension .. math:: \mathbf{u}=\left((0,0.3426),(1,1.3258),(2,6.8943),(3,8.387)\right) * This can be represented by a function .. math:: u:[0,1,2,3]\mapsto[0.3426,1.3258,6.8943,8.387] * In general, these sort of functions can be represented by .. math:: u:\mathcal{I}\mapsto\mathbb{R} where :math:`\mathcal{I}` is a finite index set. * If we extend :math:`\mathcal{I}` to be countably infinite or uncountable (e.g. :math:`\mathbb{N}` or :math:`\mathbb{R}`), then, we end up with infinite lists of tuples, e.g. .. math:: f=\{(x,y)|x,y\in\mathbb{R}\} * This is the way functions are defined in set theory, where the dimension of each function (informally) is :math:`|\mathcal{X}|`. Cardinality of Function Space -------------------------------------------------------------------------------- .. note:: * The space of functions :math:`f:\mathcal{X}\mapsto\mathcal{Y}` is often denoted by :math:`\mathcal{Y}^{\mathcal{X}}`. * For every element :math:`x\in\mathcal{X}`, we have a choice to make its image map to some :math:`y\in\mathcal{Y}`. * Therefore, the size of this choice for each :math:`x\in\mathcal{X}` is :math:`|\mathcal{Y}|`. * The size of the choice for **all** the elements in :math:`\mathcal{X}` is therefore :math:`|\mathcal{Y}|^{|\mathcal{X}|}`. .. tip:: * To remember, we can think of the selection problem, where we have :math:`n` items which are either selected or discarded. * We know from combinatorics that the total number of such choices is :math:`2^n`. * This can be formed as a function where each of the :math:`n` elements map to either 0 or 1. * We also verify that function view of real-valued vectors which maps the index set (of size :math:`d`) to reals also make sense since we represent the dimension as :math:`\mathbb{R}^d`. Technicalities ================================================================================ As long as we're restricting ourselves to the class of functions from one vector (linear) space :math:`\mathcal{X}` to another :math:`\mathcal{Y}` over the same underlying scalar field :math:`\mathbb{F}`, we can faithfully recover many useful properties from a finite dimensional vector spaces for the (potentially infinite dimensional) space of functions. Addition and Scalar Multiplication -------------------------------------------------------------------------------- .. note:: * For finite dimensional vectors :math:`\mathbf{u},\mathbf{v}\in V_{\mathcal{F}}`, where :math:`\mathcal{F}` is the underlying field: * [Vector Addition]: We add the corresponding values for each dimension. * [Scalar Multiplication]: For any :math:`\alpha\in\mathcal{F}`, we multiply :math:`\alpha\in\mathcal{F}` to the value of each dimension. * For functions :math:`f:\mathcal{X}\mapsto\mathcal{Y}` and :math:`g:\mathcal{X}\mapsto\mathcal{Y}`, the dimensions are :math:`x\in\mathcal{X}`. * [Vector Addition]: We can define vector addition of a function for each "dimension" :math:`x` .. math:: (f + g)(x) = f(x) + g(x) * [Scalar Multiplication]: We can define scalar multiplication for each "dimension" :math:`x` .. math:: (\alpha\cdot f)(x) = \alpha\cdot f(x) .. tip:: With :math:`\mathcal{X}` and :math:`\mathcal{Y}` being linear space over the same scalar field, the addition and scalar multiplication makes sense. Inner Product -------------------------------------------------------------------------------- .. note:: * For finite dimensional vectors :math:`\mathbf{u},\mathbf{v}\in V_{\mathcal{F}}`, to compute the inner (dot) product * We take a scalar product of the corresponding values for each dimension. * We sum the results across all dimensions. .. math:: \langle\mathbf{u},\mathbf{v}\rangle=\sum_{i=1}^n u_i\cdot v_i .. warning:: * Let's add a constraint that :math:`\mathcal{Y}` is equipped with an inner product. * Let's add a constraint that :math:`\mathcal{X}` is equipped with a positive measure :math:`\mu(x)` and :math:`\mathop{d\mu}(x)=\mathop{dx}`. .. note:: * For functions :math:`f:\mathcal{X}\mapsto\mathcal{Y}` and :math:`g:\mathcal{X}\mapsto\mathcal{Y}` * We can take a scalar product for each dimension :math:`x`. * Since :math:`\mathcal{X}` can be uncountable, we replace the sum with integration .. math:: \langle f,g\rangle=\int_{\mathcal{X}}f(x)\cdot g(x)\mathop{dx} .. tip:: With :math:`\mathcal{Y}` being an inner product space, dot product under the integral makes sense. Orthogonality ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * Two functions :math:`f` and :math:`g` are orthogonal if their inner product is 0. * Example: For real trig functions :math:`\sin:[0,\pi]\mapsto[0,1]` and :math:`\cos:[0,\pi]\mapsto[0,1]` .. math:: \langle\sin,\cos\rangle=\int_\limits{0}^{\pi}\sin(x)\cos(x)\mathop{dx}=0 Norm - Induced by the Inner Product -------------------------------------------------------------------------------- Lp Space ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * The inner product for finite vectors induces a norm (:math:`l_2`) .. math:: ||\mathbf{u}||_2^2=\langle \mathbf{u},\mathbf{u}\rangle=\sum_{i=1}^n|u_i|^2 * The inner product defined above induces a norm .. math:: ||f||_2^2=\langle f,f\rangle=\int_{\mathcal{X}}|f(x)|^2\mathop{dx} * More generally, we can have .. math:: ||f||_{L_p}=\left(\int_{\mathcal{X}}|f(x)|^p\mathop{dx}\right)^{1/p} .. tip:: * For more general measurable spaces where we have a measure :math:`\mu(x)` defined .. math:: ||f||_{L_p(\mathcal{X},\mu)}=\left(\int_{\mathcal{X}}|f(x)|^p\mathop{d\mu}(x)\right)^{1/p} * For :math:`p=\infty` .. math:: ||f||_{L_\infty(\mathcal{X},\mu)}=\text{ess}\sup_\limits{x\in\mathcal{X}}|f(x)| * We write the function space as :math:`L^p(\mathcal{X},\mathcal{Y})=\{f|f:\mathcal{X}\mapsto\mathcal{Y};\text{such that }L_p(\mathcal{X,\mu})\text{ exists}\}` * Example: :math:`L^2([0,1],\mathbb{R})` Metric - Induced by the Norm -------------------------------------------------------------------------------- .. note:: * The :math:`l_p` norm for finite vectors induces a metric .. math:: d(\mathbf{u}, \mathbf{v})=||\mathbf{u}-\mathbf{v}||_2=\left(\sum_{i=1}^n|u_i-v_i|^p\right)^{1/p} * We can define, similarly, for functions .. math:: d(f, g)=||f-g||_{L_p(\mathcal{X},\mu)}=\left(\int_\limits{i=1}^n|f(x)-g(x)|^p\mathop{d\mu}(x)\right)^{1/p} * If :math:`d(f, g)=0`, then the functions are the same "almost everywhere". * In this case, they are different for **at most** finitely many "dimensions". Topological Properties -------------------------------------------------------------------------------- .. note:: With a metric defined, we can define topological properties such as convergence and complete function spaces. .. tip:: * Complete normed spaces are known as `Banach Space `_. * Complete inner product spaces are known as `Hilbert Space `_. .. warning:: * Without the metric, the only topology we can have for the set of functions is the `product topology `_ (as suggested by :math:`\mathcal{Y}^{\mathcal{X}}`). * With product topology, the only convergence that we can define is `point-wise convergence `_ which is a weak form of convergence. Point-wise Convergence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * Let :math:`(f_n)_{n=1}^\infty` be a sequence of functions where :math:`f_n:\mathcal{X}\mapsto\mathcal{Y}`. * Let :math:`f` be another function :math:`f:\mathcal{X}\mapsto\mathcal{Y}` * We say that the sequence is point-wise converging towards :math:`f` .. math:: \lim\limits_{n\to\infty}f_n=f\iff\forall x\in\mathcal{X}, \lim\limits_{n\to\infty}f_n(x)=f(x) Uniform Convergence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * We can make a stronger convergence criterion for functions which map to a metric space :math:`\mathcal{Y}` (i.e., where :math:`\sup` makes sense). * We say that the sequence is uninformly converging towards :math:`f` .. math:: \lim\limits_{n\to\infty}f_n=f\iff\lim\limits_{n\to\infty}\sup\{|f_n(x)-f(x)|:x\in\mathcal{X}\}=0 .. tip:: * All uniformly convergent functions are point-wise convergent. * The converse is not true. Linear Transforms -------------------------------------------------------------------------------- Linear Transforms on Euclidean Space ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * We consider an normalized set of basis vectors (i.e. of unit-length but not necessarily orthogonal) in :math:`\mathbb{R}^n` for a finite dimensional vector space as .. math:: \{\mathbf{a}_1,\cdots\mathbf{a}_n\} * We can find the projection of any vector :math:`\mathbf{u}\in\mathbb{R}^n` onto each of the basis .. math:: \langle\mathbf{a}_i,\mathbf{u}\rangle * Under the new basis, this gives the i-th co-ordinate for the result vector :math:`\mathbf{v}` .. math:: \begin{bmatrix}0\\\vdots\\v_i\\\vdots\\0\end{bmatrix}=\langle\mathbf{a}_i,\mathbf{u}\rangle * We note that we can collect the basis vectors inside a matrix as rows and express the relation as .. math:: \mathbf{v}=\begin{bmatrix}-&\mathbf{a_1^*}&-\\ \vdots&\vdots&\vdots\\ -&\mathbf{a_n^*}&-\\\end{bmatrix}\mathbf{u}=\mathbf{A}^T\mathbf{u} * We also note that the final vector can be written as a sum .. math:: \mathbf{v}=\sum_{i=1}^n\begin{bmatrix}0\\\vdots\\v_i\\\vdots\\0\end{bmatrix}=\sum_{i=1}^n\langle\mathbf{u},\mathbf{a}_i\rangle Function view ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * Let :math:`\mathcal{I}=\{1,\cdots,d\}` be the index set. * The vector :math:`\mathbf{u}` defines a function :math:`u:\mathcal{I}\mapsto\mathbb{R}\in\mathcal{U}(\mathcal{I})` * The basis vectors :math:`\mathbf{a}_k` are also functions :math:`a_k:\mathcal{I}\mapsto\mathcal{H}` where :math:`\mathcal{H}^{\mathcal{I}}\subseteq\mathbb{R}^d` is the subspace spanned by the basis. * Then the matrix :math:`\mathbf{A}` defines a linear transform :math:`A:\mathcal{H}\mapsto\mathcal{U}(\mathcal{I})` where .. math:: v(i)=(Au)(i) Linear Transforms on Function Space ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * We consider functions :math:`f:E\mapsto\mathbb{R}\in\mathcal{F}(E)` (similar to :math:`\mathcal{U}(\mathcal{I})`) where :math:`E` is any abstract space. * We consider basis functions :math:`h:E\mapsto\mathcal{H}` where :math:`\mathcal{H}` is a Hilbert space equipped with a norm .. math:: \langle \cdot,\cdot\rangle_{\mathcal{H}} * For any :math:`x\in E`, the projection onto a basis is given by .. math:: f(x)=\langle f,h(x)\rangle_{\mathcal{H}} * This defines a linear transform :math:`L:\mathcal{H}\mapsto\mathcal{F}(E)` where .. math:: f(\cdot)=(Lf)(\cdot) Kernel of a Linear Transform ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * The kernel of a linear transform is found by inner products of the basis functions. * For finite dimensional case :math:`K:\mathcal{I}\times\mathcal{I}\mapsto\mathbb{R}` .. math:: K(i,j)=\langle\mathbf{a}^*_i,\mathbf{a}^*_j\rangle * For the functional case :math:`K:E\times E\mapsto\mathbb{R}` .. math:: K(x,y)=\langle h(y),h(x)\rangle_{\mathcal{H}} * For any fixed :math:`y_n\in E`, :math:`K(x,y_n)` is a function of just :math:`x\in E` in :math:`\mathcal{F}(E)` * Therefore, often the basis functions are referred by just the kernel itself as .. math:: K(\cdot,x)=h(x) and .. math:: K(x,y)=\langle K(\cdot,x),K(\cdot,y)\rangle * Symmetry: :math:`K(x,y)=K(y,x)` due to the symmetry of the inner product. * Reproducing Property: * Function evaluation is inner product: :math:`f(x)=\langle f,K(\cdot,x)\rangle_{\mathcal{H}}` * Therefore, knowing the kernel, we can recover any function. * Any function :math:`g\in\mathcal{F}(E)` can be expressed as a linear combination of the basis .. math:: g(x)=\sum_m\alpha_m K(x,y_m) .. tip:: * We note that when the matrix is the centered, normalised data matrix, then the kernel gives the sample covariance matrix. * This hints as the usability of functional kernels for covariance functions for infinite dimensional Gaussian distributions (GPs). Moore-Aronszajn Theorem ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: * We assume that the kernel is positive definite. * Then there exists a unique :math:`\mathcal{H}_k` for which :math:`K` is the kernel basis. Mercer Theorem - Eigenfunctions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: TODO Other Integral Transforms -------------------------------------------------------------------------------- General Form ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: TODO Fourier Basis ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * For "well-behaved" (i.e. square-integrable so that one can define :math:`L_2` norm as per above) periodic functions, we can have `basis functions of odd and even frequencies `_. * `Schauder basis `_ (allows for infinite sum over basis): * A basis for functions in :math:`L^2([0,1],\mathbb{R})` can be defined in terms of an infinite set of orthonormal functions` .. math:: \{1, (\sqrt{2}\sin(2\pi nx))_{n=1}^\infty, (\sqrt{2}\cos(2\pi nx))_{n=1}^\infty\} * The :math:`\sin` functions account for odd-frequencies and the :math:`\cos` functions account for even-frequencies. * Here we have 3 sets of basis functions, so we use 3 different kinds of normalised-projection co-efficients, :math:`a_0,a_i,b_i` .. math:: f(x)=a_0\cdot1+\sum_{n=1}^\infty a_i\cdot\cos(2\pi nx)+\sum_{n=1}^\infty b_i\cdot\sin(2\pi nx) * :math:`a_0` computes the projection of :math:`f(x)` onto the constant function :math:`1`. .. math:: a_0=\frac{\int_\limits{[0,1]}1\cdot f(x)\mathop{dx}}{\int_\limits{[0,1]}1\cdot 1\mathop{dx}}=\int_\limits{[0,1]}f(x)\mathop{dx} * For each :math:`k>0`, :math:`a_k` computes the projection of :math:`f(x)` onto the even frequencies, :math:`\sqrt{2}\cos(2\pi nx)`. .. math:: a_k=\frac{\int_\limits{[0,1]}f(x)\cdot\sqrt{2}\cos(2\pi kx)\mathop{dx}}{\int_\limits{[0,1]}\sqrt{2}\cos(2\pi kx)\cdot\sqrt{2}\cos(2\pi kx)\mathop{dx}} * Similarly, for :math:`b_k`. ******************************************************************************** Useful Function Spaces ******************************************************************************** Sobolev Space ================================================================================ Holder Space ================================================================================ Reproducing kernel Hilbert space ================================================================================ ******************************************************************************** Generalised Functions - Distributions ******************************************************************************** Dirac-Delta Function ================================================================================ .. seealso:: * `Functions are vectors `_ * `THE GEOMETRY OF MATHEMATICAL METHODS `_ * `Math 353 Lecture Notes Fourier series `_ * `[MIT] 9.520 Math Camp 2010 Functional Analysis Review `_ * `SO post about Dirac delta being a generalized function instead of normal function `_