Law of Large Numbers

March 31, 2023 - A fundamental principle stating that with increasing the sample size of a random variable, the sample mean of this variable approaches the true mean of the population.

The more data we collect, the more accurate our estimate of the true population mean becomes. The Law of Large Numbers is an important concept in the development of statistical models and methods for analyzing data.

Theorem: Law of Large Numbers

Let $$ X_1, \dots, X_n$$ be a sequence of i.i.d. (independent and identically distributed) random variables with $$ E[X_i] = \mu \quad and \quad Var[X_i] < \infty$$ For the sample mean: $$ \hat{\mu} = \frac{X_1+ \dots + X_n}{n}$$ It holds: $$lim_{n \rightarrow \infty} p(|\hat{\mu} - \mu| \geq \epsilon) = 0$$

Proof of the Law of Large Numbers

The proof uses Chebychev's Inequality, stating the following:

For a defined expectation of X, meaning: $$E[X] < \infty$$ and $$\lambda > 0$$ we have: $$ p( | X - E[X] | \geq \lambda) \leq \frac{Var(X)}{\lambda^2}$$ where Var(X) denotes the variance of X.

Apply Chebychev's Inequality:

$$p(|\hat{\mu} - E[\hat{\mu}]| \geq \epsilon) \leq \frac{Var[\hat{\mu}]}{\epsilon^2}$$ Note, that we interchanged $$\lambda = \epsilon$$ to be consistent with the literature on the Law of Large Numbers. We rewrite: $$1. \quad E[\hat{\mu}] = E[\frac{X_1 + \dots + X_n}{n}] = \frac{1}{n} E[X_1 + \dots + X_n] \underbrace{=}_{X_i's \, identical \, Dist.} \frac{n}{n}\mu = \mu$$ where we used $$ E[X_i] = \mu$$ and we have it n times. $$2. \quad Var[\hat{\mu}] = Var[X_1 + \dots + X_n] = \frac{1}{n^2}Var[X_1 + \dots + X_n] \underbrace{=}_{X_i's \, independent} \frac{1}{n^2}Var[X_1]+\dots+Var[X_n]$$ $$\underbrace{=}_{X_i's \, identical \, Dist. } \frac{n}{n^2}Var[X_i] = \frac{1}{n} Var[X_i]$$ Now we can put 1. and 2. into Chebychev above: $$p(|\hat{\mu} - \mu| \geq \epsilon) \leq \frac{\frac{Var[X_i]}{n}}{\epsilon^2}$$ $$\Longleftrightarrow p(|\hat{\mu} - \mu| \geq \epsilon) \leq \frac{Var[X_i]}{n\epsilon^2}$$ Now take the limit for n to infinity: $$lim_{n \rightarrow \infty}p(|\hat{\mu} - \mu| \geq \epsilon) \leq \lim_{n \rightarrow \infty}\frac{Var[X_i]}{n\epsilon^2}$$ As $$ Var[X_i] < \infty$$ (Assumption from the theorem - see above), we get: $$lim_{n \rightarrow \infty}p(|\hat{\mu} - \mu| \geq \epsilon) \leq \frac{Var[X_i]}{\epsilon^2} \lim_{n \rightarrow \infty}\frac{1}{n}$$ $$ \Longleftrightarrow lim_{n \rightarrow \infty}p(|\hat{\mu} - \mu| \geq \epsilon) \leq 0$$ As a probability is by definition bounded between 0 and 1 (can't be smaller than 0) we arrive at the Law of Large Number statement: $$lim_{n \rightarrow \infty}p(|\hat{\mu} - \mu| \geq \epsilon) = 0$$

Final Remarks:

The Law of Large Numbers holds in the limit, meaning that given enough trials/samples, we can argue that the sample mean converges to the true mean of the population.

Even though it is called a "law" it is just a fundamental theorem and does not necessarily needs to hold. Not all data obeys the law!

Especially if you don't have identical distributed data or independent samples, which is a crucial assumption for this theorem to actually hold.