joint_marginal_conditional

Joint, Marginal, Conditional Distribution

April 1, 2023 - Joint, marginal, and conditional distributions play a crucial role in capturing the relationships between multiple random variables.

Definition: Joint Probability Distribution (Discrete Case)

The joint distribution describes the probability of observing a particular combination of values for multiple random variables simultaneously.

Say we have 2 random variables X and Y. The joint probability distribution of those variables is defined as: $$p(X=x, Y=y)$$ It states: The proportion of times the events (X=x and Y=y) happened together! out of the total number of trials. It holds that $$p(X=x, Y=y) = p(Y=y, X=x) \quad symmetric \, property$$ Generalizing to n random variables: $$X_1, \dots, X_n$$ we get for the joint probability distribution: $$p(X_1=x_1, X_2=x_2, \dots, X_n = x_n)$$


Definition: Marginal Probability

The marginal distribution describes the probability distribution of a single variable in isolation, ignoring the values of the other variables.

For a random variable X, the marginal distribution is defined as: $$p(X=x)$$ It states: The proportion of times this event (X=x) happened out of the total number of trials.

Given the joint probability distribution we can easily calculate the marginal by using the sum rule:

For 2 random variables X and Y: $$ p(X=x) = \sum_{y \in Y} p(X=x, Y=y)$$ we marginalize out Y to get the marginal probability distribution of X (hence the name).


Definition: Conditional Probability Distribution

Conditional distributions are a way of describing the probability distribution of one variable, given that the value of another variable is known. In other words, a conditional distribution provides information on how the probability distribution of one variable changes in response to changes in another variable.

For two random variables X and Y, the conditional distribution is defined as: $$p(X=x | Y=y) = \frac{p(X=x, Y=y)}{p(Y=y)}$$ It states: The proportion of times the event X=x happens out of the times that the event Y=y happens.

Given the conditional probability, we can easily rewrite an equation for the joint probability. The result of this is the product rule:

$$p(X=x, Y=y) = p(X=x|Y=y)p(Y=y)$$ via the symmetric property of the joint probability, it holds that: $$p(X=x, Y=y) = p(X=x|Y=y)p(Y=y)$$ $$ = p(Y=y, X=x) = p(Y=y|X=x)p(X=x)$$

Given this expression of the joint probability, we can rewrite the sum rule:

$$ p(X=x) = \sum_{y \in Y} p(X=x, Y=y) = \sum_{y \in Y} p(X=x|Y=y)p(Y=y)$$ $$ \Longleftrightarrow p(X=x) = \sum_{y \in Y} p(Y=y, X=x) = \sum_{y \in Y} p(Y=y|X=x)p(X=x)$$

We know have the tools at hand to proof Bayes' Theorem:

$$p(X=x | Y=y) = \frac{p(X=x, Y=y)}{p(Y=y)}$$ $$= \frac{p(Y=y|X=x)p(X=x)}{p(Y=y)}$$ $$= \frac{p(Y=y|X=x)p(X=x)}{\sum_{x \in X} p(Y=y|X=x)p(X=x)}$$ with $$p(Y=y | X=x): Likelihood$$ $$p(X=x): Prior$$ $$P(Y=y): Evidence$$

Bayes' theorem is a fundamental concept in probability theory and has wide-ranging applications across many different fields. Its importance lies in its ability to help us make more accurate predictions and decisions by incorporating both prior knowledge and new data. It allows us to update our beliefs about an event as we receive new information.

Keep in mind, that the prior information is important and can obviously have negative side effects if your prior beliefs about X are wrong.

Dustin
Author Dustin
Secret Source Logo