Non-Parametric Methods¶

Estimation of CDF¶

Empirical distribution function as an estimator¶

The estimator for any CDF $F$ is the discrete estimator $\widehat{F}_n$ which assigns a mass $1/n$ to every point in sample $\{X_i\}_{i=1}^n$.

Note

Let

\[\begin{split}I(X_i\leq x)=\begin{cases}1 & \text{if $X_i\leq x$}\\ 0 & \text{otherwise}\end{cases}\end{split}\]

Then

\[\widehat{F}_n(x)=\frac{1}{n}\sum_{i=1}^n I(X_i\leq x_i)\]

Attention

Unbiased: $\mathbb{E}[\widehat{F}_n(x)]=F(x)$
$\text{se}_F^2=\mathbb{V}_F(\widehat{F}_n)=\frac{F(x)(1-F(x))}{n}$, and $\lim\limits_{n\to\infty}\text{mse}(\widehat{F}_n)=0$.
Empirical distribution function is a consistent estimator for any distribution.

\[\widehat{F}_n(x)\xrightarrow[]{P}F(x)\]

Confidence interval for CDF estimator¶

Note

Glivenko-Cantelli Theorem: $||\widehat{F_n}(x)-F(x)||_\infty=\sup_{x}|\widehat{F_n}(x)-F(x)|\xrightarrow[]{as} 0$.
Dvoretzsky-Kiefer-Wolfowitz (DKW) Inequality: For any $\epsilon>0$,

\[\mathbb{P}(\sup_x|\widehat{F_n}(x)-F(x)|>\epsilon) \le 2\exp(-2n\epsilon^2)\]

Tip

It can be derived from DKW that we can form a $1-\alpha$ CI of width $2\epsilon_n$ around $\widehat{F_n}$ where $\epsilon_n=\sqrt{\frac{1}{2n}\ln(\frac{2}{\alpha})}$.
- TODO: derive.

Plug-in Estimator for Statistical Functionals¶

The plug-in estimator $\widehat{T}_n(F)$ for any $T(F)$ can be obtained by replacing $F$ with $\widehat{F}_n$.

Estimator for mean¶

Note

Here $T(F)=\int x\mathop{dF}$. Since $\widehat{F}_n$ is discrete

\[\widehat{T}_n(F)=T(\widehat{F}_n)=\frac{1}{n}\sum_{i=1}^nX_i=\bar{X}\]

$\text{se}_F^2=\mathbb{V}_F(\widehat{T}_n)=\frac{\sigma^2}{n}$.
CLT says that this estimator is asymptotically normal.

Tip

$\text{se}_F$ depends on the true distribution $F$.
If the true variance $\sigma^2$ is not known, it can be estimated as the next step.
Let the estimate for $\text{se}_F$ be $\widehat{\text{se}}_n$. Assuming asymptotic normality, we can compute confidence interval as

\[T(\widehat{F}_n)\pm z_{\alpha/2}\widehat{\text{se}}_n\]

Estimtor for variance¶

Note

Here $T(F)=\int (x-\mathbb{E}[X]^2)\mathop{dF}$. Since $\widehat{F}_n$ is discrete

\[\widehat{T}(F)=T(\widehat{F}_n)=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2=S^2_n\]

For sample mean estimator, $\widehat{\text{se}}^2_n=S^2_n$

Tip

We can use similar techniques for estimating any moments of $F$.

Estimator for other functionals¶

The estimator can be obtained similarly.

Tip

$\text{se}_F$ often has to be estimated in order to obtain a confidence interval.
As the estimator is also a statistic, the variance can be obtained using the following methodology.

Variance Estimation of a Statistic for CI¶

We’re interested in estimating the variance of a statistic $g(X_1,\cdots,X_n)$ given the sample.

Bootstrap¶

Key Idea¶

Let $X^*=(X^*_1,\cdots,X^*_2)$ be a simulation obtained from the original sample $(x_1,\cdots,x_n)$ by drawing with replacement.

Note

Let $Y=g(X^*_1,\cdots,X^*_n)$
WLLN: $\frac{1}{B}\sum_{i=1}^BY_i\xrightarrow[]{P}\mathbb{E}[Y]$
$\frac{1}{B}\sum_{i=1}^Bh(Y_i)\xrightarrow[]{P}\mathbb{E}[h(Y)]$
$\frac{1}{B}\sum_{i=1}^B(Y_i-\bar{Y})^2=\frac{1}{B}\sum_{i=1}^n Y_i^2-\left(\frac{1}{B}\sum_{i=1}^n Y_i\right)^2\xrightarrow[]{P}\mathbb{E}[Y^2]-(\mathbb{E}[Y])^2=\mathbb{V}(Y)$

Tip

We can therefore estimate the variance of a statistic by sample variance obtained via simulation $B$ times.

Obtaining the variance of an estimator¶

Let the estimator for $T(F)$ be $\widehat{T}_n=g(X_1,\cdots,X_n)$.

Note

For $i=1$ to $B$:
- Obtain a simulated sample $X_i^*=(X^*_{i,1},\cdots,X^*_{i,n})$.
- Compute estimate $\widehat{T}^*_{n,i}=g(X^*_{i,1},\cdots,X^*_{i,n})$
Compute bootstrap variance

\[v_{\text{boot}}=\frac{1}{B}\sum_{i=1}^B(\widehat{T}^*_{n,i}-\frac{1}{B}\sum_{j=1}^B\widehat{T}^*_{n,i})^2\]
Use estimation strategy

\[\mathbb{V}_F(\widehat{T}_n)\approx\mathbb{V}_{\widehat{F}_n}(\widehat{T}_n)\approx v_{\text{boot}}\]

Tip

We can use $v_{\text{boot}}$ to obtain $\text{se}$ and compute CI.

Jack knife¶

Note

Instead of a simulated sample obtained via replacement, we remove one observation and consider it a new sample.
Rest of the steps are carried out exactly the same way as bootstrap and we get $v_{\text{jack}}$ to compute CI.
This is less computationally expensive than bootstrap.