################################################################################ Parametric Point Estimation ################################################################################ ******************************************************************************** Classical Infernece ******************************************************************************** Method of Moments Estimator (MOM) ================================================================================ .. note:: * Let the parameter vector be :math:`\boldsymbol{\theta}=(\theta_1,\cdots,\theta_k)`. * Let the estimator be :math:`\widehat{\Theta}_n=(\widehat{\theta_1},\cdots,\widehat{\theta_k})`. * Let :math:`\alpha_j=\alpha_j({\boldsymbol{\theta}})=\mathbb{E}_{\boldsymbol{\theta}}[X^j]=\int x^j\mathop{dF_{\boldsymbol{\theta}}}(x)` for :math:`1\leq j\leq k`. * We assume that the moments exist and can be expressed in closed form as equations involving the parameters :math:`\theta_j`. * Estimate moments with sample moments as .. math:: \widehat{\alpha_j}({\boldsymbol{\theta}})=\alpha(\widehat{\Theta}_n)=\frac{1}{n}\sum_{i=1}^n X_i^j * We'd have a system of equations k equations with k unknowns involving :math:`(\widehat{\theta}_j)_{j=1}^k`. Properties -------------------------------------------------------------------------------- .. seealso:: * **Consistent**: :math:`\widehat{\Theta}_n\xrightarrow[]{P}\boldsymbol{\theta}` * **Asymptotically normal**: * TODO write expression Common Estimators -------------------------------------------------------------------------------- Bernoulli ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * We have samples :math:`X=(X_1,\cdots,X_n)` from a Bernoulli with unknown :math:`p`. * :math:`\widehat{\alpha_0}=p=\frac{1}{n}\sum_{i=1}^n X_i`. Normal ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * We have samples :math:`X=(X_1,\cdots,X_n)` from a Normal with unknown :math:`\mu,\sigma`. * :math:`\widehat{\alpha_0}=\mu=\frac{1}{n}\sum_{i=1}^n X_i`. * :math:`\widehat{\alpha_1}=\mu^2+\sigma^2=\frac{1}{n}\sum_{i=1}^n X^2_i`. Maximum Likelihood Estimator (MLE) ================================================================================ Likelihood function -------------------------------------------------------------------------------- .. note:: * We assume that we have samples of size :math:`n`, :math:`X=(X_1,\cdots,X_n)` such that :math:`X_i\sim f_{X_i}(x_i; \theta)`. * Likelihood function is defined as :math:`\mathcal{L}_n(\theta)=f_X(x; \theta)=f_{X_1,\cdots,X_n}(x_1,\cdots,x_n;\theta)`. .. warning:: * Given a particular observation :math:`X=x=(x_1,\cdots,x_n)`, the function :math:`\mathcal{L}_n(\theta)=f_X(x; \theta)` is no longer a density, but just a function of :math:`\theta`. * For discrete case, :math:`\mathcal{L}_n(\theta)=p_X(x; \theta)=\mathbb{P}(X_1=x_1,\cdots,X_n=x_n;\theta)`. * This is the probability that the observation would match current data under a particular :math:`\theta`. * If this probability is higher when :math:`\theta=\theta_i` compared to :math:`\theta=\theta_j`, it is more likely that the underlying parameter has value :math:`\theta_i`. .. note:: We estimate :math:`\widehat{\Theta}_n=\widehat{\Theta}_{\text{ML}}=\mathop{\underset{\theta}{\mathrm{argmax}}}\mathcal{L}(\theta)`. Log likelihood -------------------------------------------------------------------------------- * Independence assumption: .. math:: \mathcal{L}_n(\theta)=f_{X_1,\cdots,X_n}(x_1,\cdots,x_n;\theta)=\prod_{i=1}^n f_{X_i}(x_i;\theta) * Identical distribution assumption: .. math:: \mathcal{L}_n(\theta)=f_{X_1,\cdots,X_n}(x_1,\cdots,x_n;\theta)=\prod_{i=1}^n f_X(x_i;\theta) * Log likelihood is defined as .. math:: \mathcal{l}_n(\theta)=\log{\mathcal{L}(\theta)}=\sum_{i=1}^n \log(f_X(x_i;\theta)) * As log is a monotonic increasing function .. math:: \mathop{\underset{\theta}{\mathrm{argmax}}}\mathcal{l}_n(\theta)=\mathop{\underset{\theta}{\mathrm{argmax}}}\mathcal{L}_n(\theta) Properties -------------------------------------------------------------------------------- .. note:: * **Consistent**: :math:`\widehat{\Theta}_{\text{ML}}\xrightarrow[]{P}\theta`. * Proof Hint: Involve KL distance between the true value of :math:`\theta`, :math:`\theta_{\text{true}}` and any arbitrary :math:`\theta`. * The likelihood function with the true value :math:`l_n(\theta_{\text{true}})` evaluates to a constant. * Maximising :math:`l_n(\theta)` is the same as maximising .. math:: M_n(\theta)=\frac{1}{n}\left(l_n(\theta)-l_n(\theta_{\text{true}})\right)=\frac{1}{n}\sum_{i=1}^n\log\left(\frac{f_X(x_i;\theta)}{f_X(x_i;\theta_{\text{true}})}\right) * Let :math:`M(\theta)` be defined as the expectation of this rv .. math:: M(\theta)=\mathbb{E}_{\theta_\text{true}}\left[\log\left(\frac{f_X(x;\theta)}{f_X(x;\theta_{\text{true}})}\right)\right]=\int\log\left(\frac{f_X(x;\theta)}{f_X(x;\theta_{\text{true}})}\right)f_X(x;\theta_{\text{true}})\mathop{dx}=-D_{KL}(\theta_{\text{true}},\theta) * Maximum value of :math:`M(\theta)` is 0. * For all :math:`\theta`, :math:`M_n(\theta)\xrightarrow[]{P}M(\theta)` * Technically, we need uniform convergence to prove this formally. * **Equivariant**: If :math:`\widehat{\Theta}_{\text{ML}}` is the MLE for :math:`\theta`, then :math:`g(\widehat{\Theta}_{\text{ML}})` is the MLE for :math:`g(\theta)`. * TODO proof * **Asymptotically normal**: :math:`\frac{\widehat{\Theta}_{\text{ML}}-\theta}{\widehat{\text{se}}}\xrightarrow[]{D}\mathcal{N}(0,1)` * Score function: :math:`s(X;\theta)=\frac{\partial}{\partial\theta}\log f(X;\theta)` * Fisher information: :math:`I_n(\theta)=\mathbb{V}_\theta\left(\sum_{i=1}^n s(X_i;\theta)\right)=\sum_{i=1}^n\mathbb{V}_\theta\left(s(X_i;\theta)\right)` * **Asymptotically optimal**: Estimator has least variance for large sample size. * TODO proof Computing CI for MLE -------------------------------------------------------------------------------- Common Estimators -------------------------------------------------------------------------------- Bernoulli ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Uniform ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Binomial ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Geometric ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Multinomial ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exponential ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Normal ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Iterative Method of Computation -------------------------------------------------------------------------------- .. note:: * For complicated or composite rvs, computation of likelihood in a closed form might be challenging. * We can approximate MLE estimates by iterative methods. Newton Raphson ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * We gather an initial estimate as a starting point, :math:`\theta'`. * MOM can give us a good starting point. * We assume that the true optimal value :math:`\theta^*` lie in the vicinity of this initial guess. * We apply first order taylor approximation The EM Algorithm ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: * TODO add more details * Assume hidden variables - likelihood computation is easier for joint ******************************************************************************** Bayesian Inference ******************************************************************************** Maximum A Posterior Estimator (MAP) ================================================================================ Common Estimators -------------------------------------------------------------------------------- Bernoulli ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Normal ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Minimum Mean Squared Error Estimator (MMSE) ================================================================================