Expected Value, Variance

April 2, 2023 - In probability theory, moments of a distribution are numerical measures that capture important features of the distribution. The first two moments of a distribution are the expectation (or mean) and variance.

Definition: Expected Value (Discrete Case)

The expected value of a random variable is a fundamental concept in probability theory and statistics. It represents the average value that would be obtained if the random variable were repeatedly sampled many times.

It is calculated by summing the product of each possible value of X and its probability of occurrence. E(X) is the weighted average of all possible values of X, where the weights are the probabilities of each value.

The concept of expected value is particularly useful in decision-making under uncertainty, as it provides a measure of what one can expect to gain or lose on average in a given situation. For example, in a game of chance, knowing the expected value of a bet can help a player make a more informed decision about whether to place that bet.

$$E_{p(x)}[X] = \sum_{x \in X} xp(x)$$ A key property of expectations is linearity for random variables: $$E_{p(x_1,\dots,x_n)}[\sum_{i=1}^n X_i] = \sum_{x_1}\dots \sum_{x_n}p(x_1,\dots,x_n)(x_1+\dots+x_n)$$ $$\sum_{x_1}x_1\sum_{x_2} \dots \sum_{x_n}p(x_1,\dots,x_n) + \sum_{x_2}x_2\sum_{x_1}\sum_{x_3} \dots \sum_{x_n}p(x_1,\dots,x_n)+\dots+\sum_{x_1}\dots\sum_{x_{n-1}} p(x_1,\dots,x_n) \sum_{x_n}x_n$$ $$\underbrace{=}_{marginalization: \, sum \, rule \, \, \,} \sum_{x_1}x_1p(x1) + \dots + \sum_{x_n}p(x_n)$$ $$= \sum_{i=1}^n E_{p(x_i)}[X_i]$$ and linearity of multiplying and adding a scalar value: $$E_{p(x)}[aX + b] = \sum_{x \in X} (ax+b)p(x)$$ $$ \sum_{x \in X} axp(x)+bp(x)$$ $$= \sum_{x \in X} axp(x)+\sum_{x \in X} bp(x)$$ $$ = a \sum_{x \in X} xp(x)+b\sum_{x \in X} p(x)$$ As $$\sum_{x \in X} p(x) = 1$$ we get $$a \sum_{x \in X} xp(x)+b\sum_{x \in X} p(x) = aE_{p(x)}[X]+b$$ For independent random variables, eg. X independent of Y, it holds: $$p(X=x,Y=y) = p(X=y)p(Y=y) \, \forall x,y$$ In short: $$p(X,Y)=p(X)p(Y)$$ The expectation of a product of those 2 independent random variables can be written as: $$E_{p(x,y)}[XY] = \sum_{x \in X}\sum_{y \in Y}p(x,y)xy$$ $$= \sum_{x \in X}\sum_{y \in Y}p(x)p(y)xy$$ $$= \sum_{x \in X}p(x)x \sum_{y \in Y}p(y)y$$ $$E_{p(x)}[X]E_{p(y)}[Y] $$

Definition: Variance

The variance of a random variable is a measure of how much the values of the random variable deviate from its expected value. Given a random variable X, Var(X) is the expected value of the squared difference between each possible value of X and its expected value.

The variance provides a measure of the spread or variability of the distribution of a random variable. A large variance indicates that the values of the random variable are widely dispersed around its expected value, while a small variance indicates that the values are tightly clustered around the expected value.

The concept of variance is useful in many areas of science and engineering, such as finance, physics, and engineering. In finance, for example, the variance of a stock's returns can be used to measure the risk associated with investing in that stock.

$$Var_{p(x)}[X] = E_{p(x)}[(X-E_{p(x)}[X])^2]$$ We can use the 2. binomial formula: $$(a-b)^2= a^2-2ab+b^2$$ to rewrite the variance: $$E_{p(x)}[(X^2-2E_{p(x)}[X]X + E_{p(x)}[X]^2)]$$ $$\underbrace{=}_{linearity \, Exp.} E_{p(x)}[X^2] - E_{p(x)}[2E_{p(x)}[X]X] + E_{p(x)}[E_{p(x)}[X]^2]$$ As: $$E_{p(x)}[E_{p(x)}[X]] = E_{p(x)}[X]$$ (the expectation over a constant is a constant), we get: $$E_{p(x)}[X^2] - E_{p(x)}[2E_{p(x)}[X]X] + E_{p(x)}[E_{p(x)}[X]^2]$$ $$= E_{p(x)}[X^2] - 2E_{p(x)}[X]E_{p(x)}[X] + E_{p(x)}[X]^2 = E_{p(x)}[X^2] - E_{p(x)}[X]^2$$ The linearity for random variables does NOT hold in general: $$Var[\sum_{i=1}^n X_i] \neq \sum_{i=1}^n Var[X_i]$$ However, it does hold IF the random variables (RV) are independent: $$Var_{p(x_1,\dots,x_n)}[\sum_{i=1}^n X_i]$$ $$= E_{p(x_1,\dots,x_n)}[(\sum_{i=1}^n X_i)^2] - E_{p(x_1,\dots,x_n)}[(\sum_{i=1}^n X_i)]^2$$ $$= E_{p(x_1,\dots,x_n)}[\sum_{i=j} X_i^2+2\sum_{i\neq j} X_iX_j] - (\sum_{i=1}^n E_{p(x_i)}[ X_i])^2$$ $$= \sum_{i=j} E_{p(x_i)}[X_i^2]+2\sum_{i\neq j} E_{p(x_i, x_j)}[X_iX_j] - \sum_{i=j} E_{p(x_i)}[ X_i]^2 - 2\sum_{i \neq j} E_{p(x_i)}[ X_i]E_{p(x_j)}[X_j]$$ $$\underbrace{=}_{independent \, RV} \sum_{i=j} E_{p(x_i)}[X_i^2]+2\sum_{i\neq j} E_{p(x_i)}[ X_i]E_{p(x_j)}[X_j] - \sum_{i=j} E_{p(x_i)}[ X_i]^2 -2\sum_{i \neq j} E_{p(x_i)}[ X_i]E_{p(x_j)}[X_j]$$ $$= \sum_{i=j} E_{p(x_i)}[X_i^2] - \sum_{i=j} E_{p(x_i)}[ X_i]^2$$ $$= \sum_{i=1}^n E_{p(x_i)}[X_i^2] - E_{p(x_i)}[ X_i]^2$$ $$ = \sum_{i=1}^n Var_{p(x_i)}[X_i]$$ Note, that we extensively used the linear property of the expectation for multiple random variables (from above): $$E_{p(x_1,\dots,x_n)}[\sum_{i=1}^n X_i] = \sum_{i=1}^n E_{p(x_i)}[X_i]$$ For multiplying and adding a scalar, it holds: $$Var_{p(x)}[aX+b] = E_{p(x)}[(aX+b)^2] - E_{p(x)}[aX+b]^2 $$ $$E_{p(x)}[a^2X^2 + 2aXb+b^2] - (aE_{p(x)}[X]+b)^2$$ $$= a^2E_{p(x)}[X^2] + 2abE_{p(x)}[X] + b^2 - (a^2E_{p(x)}[X]^2 +2abE_{p(x)}[X] + b^2) $$ $$= a^2(E_{p(x)}[X^2]-E_{p(x)}[X]^2)$$ $$= a^2Var_{p(x)}[X]$$