Non-Parametric Methods¶
Estimation of CDF¶
Empirical distribution function as an estimator¶
The estimator for any CDF \(F\) is the discrete estimator \(\widehat{F}_n\) which assigns a mass \(1/n\) to every point in sample \(\{X_i\}_{i=1}^n\).
Note
Let
\[\begin{split}I(X_i\leq x)=\begin{cases}1 & \text{if $X_i\leq x$}\\ 0 & \text{otherwise}\end{cases}\end{split}\]
Then
\[\widehat{F}_n(x)=\frac{1}{n}\sum_{i=1}^n I(X_i\leq x_i)\]
Attention
Unbiased: \(\mathbb{E}[\widehat{F}_n(x)]=F(x)\)
\(\text{se}_F^2=\mathbb{V}_F(\widehat{F}_n)=\frac{F(x)(1-F(x))}{n}\), and \(\lim\limits_{n\to\infty}\text{mse}(\widehat{F}_n)=0\).
Empirical distribution function is a consistent estimator for any distribution.
\[\widehat{F}_n(x)\xrightarrow[]{P}F(x)\]
Confidence interval for CDF estimator¶
Note
Glivenko-Cantelli Theorem: \(||\widehat{F_n}(x)-F(x)||_\infty=\sup_{x}|\widehat{F_n}(x)-F(x)|\xrightarrow[]{as} 0\).
Dvoretzsky-Kiefer-Wolfowitz (DKW) Inequality: For any \(\epsilon>0\),
\[\mathbb{P}(\sup_x|\widehat{F_n}(x)-F(x)|>\epsilon) \le 2\exp(-2n\epsilon^2)\]
Tip
It can be derived from DKW that we can form a \(1-\alpha\) CI of width \(2\epsilon_n\) around \(\widehat{F_n}\) where \(\epsilon_n=\sqrt{\frac{1}{2n}\ln(\frac{2}{\alpha})}\).
TODO: derive.
Plug-in Estimator for Statistical Functionals¶
The plug-in estimator \(\widehat{T}_n(F)\) for any \(T(F)\) can be obtained by replacing \(F\) with \(\widehat{F}_n\).
Estimator for mean¶
Note
Here \(T(F)=\int x\mathop{dF}\). Since \(\widehat{F}_n\) is discrete
\[\widehat{T}_n(F)=T(\widehat{F}_n)=\frac{1}{n}\sum_{i=1}^nX_i=\bar{X}\]
\(\text{se}_F^2=\mathbb{V}_F(\widehat{T}_n)=\frac{\sigma^2}{n}\).
CLT says that this estimator is asymptotically normal.
Tip
\(\text{se}_F\) depends on the true distribution \(F\).
If the true variance \(\sigma^2\) is not known, it can be estimated as the next step.
Let the estimate for \(\text{se}_F\) be \(\widehat{\text{se}}_n\). Assuming asymptotic normality, we can compute confidence interval as
\[T(\widehat{F}_n)\pm z_{\alpha/2}\widehat{\text{se}}_n\]
Estimtor for variance¶
Note
Here \(T(F)=\int (x-\mathbb{E}[X]^2)\mathop{dF}\). Since \(\widehat{F}_n\) is discrete
\[\widehat{T}(F)=T(\widehat{F}_n)=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2=S^2_n\]
For sample mean estimator, \(\widehat{\text{se}}^2_n=S^2_n\)
Tip
We can use similar techniques for estimating any moments of \(F\).
Estimator for other functionals¶
The estimator can be obtained similarly.
Tip
\(\text{se}_F\) often has to be estimated in order to obtain a confidence interval.
As the estimator is also a statistic, the variance can be obtained using the following methodology.
Variance Estimation of a Statistic for CI¶
We’re interested in estimating the variance of a statistic \(g(X_1,\cdots,X_n)\) given the sample.
Bootstrap¶
Key Idea¶
Let \(X^*=(X^*_1,\cdots,X^*_2)\) be a simulation obtained from the original sample \((x_1,\cdots,x_n)\) by drawing with replacement.
Note
Let \(Y=g(X^*_1,\cdots,X^*_n)\)
WLLN: \(\frac{1}{B}\sum_{i=1}^BY_i\xrightarrow[]{P}\mathbb{E}[Y]\)
\(\frac{1}{B}\sum_{i=1}^Bh(Y_i)\xrightarrow[]{P}\mathbb{E}[h(Y)]\)
\(\frac{1}{B}\sum_{i=1}^B(Y_i-\bar{Y})^2=\frac{1}{B}\sum_{i=1}^n Y_i^2-\left(\frac{1}{B}\sum_{i=1}^n Y_i\right)^2\xrightarrow[]{P}\mathbb{E}[Y^2]-(\mathbb{E}[Y])^2=\mathbb{V}(Y)\)
Tip
We can therefore estimate the variance of a statistic by sample variance obtained via simulation \(B\) times.
Obtaining the variance of an estimator¶
Let the estimator for \(T(F)\) be \(\widehat{T}_n=g(X_1,\cdots,X_n)\).
Note
For \(i=1\) to \(B\):
Obtain a simulated sample \(X_i^*=(X^*_{i,1},\cdots,X^*_{i,n})\).
Compute estimate \(\widehat{T}^*_{n,i}=g(X^*_{i,1},\cdots,X^*_{i,n})\)
Compute bootstrap variance
\[v_{\text{boot}}=\frac{1}{B}\sum_{i=1}^B(\widehat{T}^*_{n,i}-\frac{1}{B}\sum_{j=1}^B\widehat{T}^*_{n,i})^2\]Use estimation strategy
\[\mathbb{V}_F(\widehat{T}_n)\approx\mathbb{V}_{\widehat{F}_n}(\widehat{T}_n)\approx v_{\text{boot}}\]
Tip
We can use \(v_{\text{boot}}\) to obtain \(\text{se}\) and compute CI.
Jack knife¶
Note
Instead of a simulated sample obtained via replacement, we remove one observation and consider it a new sample.
Rest of the steps are carried out exactly the same way as bootstrap and we get \(v_{\text{jack}}\) to compute CI.
This is less computationally expensive than bootstrap.