probability

Steve for a 🐐ed intro.

Probability

“Laplace’s Rule”:
$P (A) = \frac{∣ A ∣}{∣Ω∣}$
$∣ A ∣$ … number of events where $A$ occurs / number of ways $A$ can happen.
$∣Ω∣$ … number of possible events.
→ $0 \leq P (A) \leq 1$

Flip a fair coin twice. What’s the probability of at least one heads occuring?

$Ω = {hh, h t, t h, tt}$
$A = {hh, h t, t h}$
$P (A) = \frac{3}{4}$

So probability is just combinatorics (counting combinations).

Roll two dice. What is the probability that at least one die is a 5?

By tedious counting:
$Ω = {(1, 1), (1, 2), \dots, (6, 6)}$
$A = {(5, 1), (5, 2), \dots, (6, 5), (6, 6)}$
$P (A) = \frac{11}{36}$
… or we take the problem apart by looking at the complement of $A$ : Either the first die is a 5 or it’s not and the second die is a 5.
$P (A) = P (X_{1} = 5) + P (X_{1} \neq = 5) \cdot P (X_{2} = 5) = \frac{1}{6} + \frac{5}{6} \cdot \frac{1}{6}$

Visually, it’s easy to see the relation to binomial coefficients (→ probability of getting a 5 on at least one die is the same as the probability of getting a 5 on exactly one die) and how probabilities stack up.

Probabilistic models are not just useful for truly random things, but also for things that are too complex to model exactly.

If there are $n$ people in a room, how large does $n$ need to be for at least $50%$ chance of at least $2$ people sharing a birthday?

Each person has to compare with each other person $(n - 1) + (n - 2) + \dots + 2 + 1 = \frac{n ( n - 1 )}{2}$ , but this won’t help us get to an exact probability, as there’s over-counting, etc.
$P (x \geq 2) = 1 - P (x = 0)$ , where $x$ is the number of people with the same birthday.
The expression on the left is very hard to compute as again, you need to keep track of all the combinations, overlaps, …
So for the right expression (the complement), we just need to divide the number of unique birthdays (sampling without replacement) by the possible number of birthdays:
$\frac{365 !}{( 365 - n )!} \cdot \frac{1}{36 5 ^{n}} = i = 1 \prod n \frac{365 - i}{365}$

Properties of probability

The probability function $P$ is a map from subsets of the sample space $Ω$ (the set of all possible outcomes of a random experiment) to the real numbers:
$P : 2^{Ω} \to R [0, 1] \in R$
$P (Ω) = 1$ … the probability of something happening is 1.
$P (\emptyset) = 0$ … the probability of nothing happening is 0.
$A \subseteq Ω ⟹ P (A) \geq 0$ … probability of an event is always non-negative.
$A \subseteq B ⟹ P (A) \leq P (B)$ … a subset of events has a smaller probability than the set it’s a subset of.
$P (A^{c}) = 1 - P (A)$ … the probability of $A$ not happening. Also sometimes denoted as $P (\neg A)$ (surprise, but without the log).
If $A, B \in Ω$ are disjoint (can’t occur at the same time), the probability of either happening is the sum of the probability of each happening:
$(A \cap B) = \emptyset ⟹ P (A \cup B) = P (A) + P (B)$
For non-disjoint events, we need to subtract the intersection, so we don’t count it twice (inclusion-exclusion principle).
$P (A \cup B) = P (A) + P (B) - P (A \cap B)$

The “counting” from earlier is just asking about the relative sizes of sets; proportions:
“ $A$ and $B$ happened”: “ $A$ or $B$ happened”:
Now it’s also clearer what happens when we multiply or add probabilities: $\cap ⟺ \cdot \cup ⟺ +$
With the caveat of overcounting for non-disjoint events and the union, and for the intersection: If they are not independent, we need to use the chain rule of probability (see below).

conditional probability

Conditional probability

$A, B$ are two events (outcomes) of a random experiment. The probability of an event $A$ given that another event $B$ has occurred is:
$P (A ∣ B) = \frac{P ( A \cap B )}{P ( B )}$
If we know that $B$ has occured, $B$ becomes the new $Ω$ :
→
In this case, $A$ becomes a lot more likely, as it occupies a larger fraction of the sample space than it did before.

Link to original

independent

Independent events

Two events $A$ and $B$ are independent if knowing $A$ gives no extra information about $B$ , and vice versa:
$P (A ∣ B) = P (A) P (B ∣ A) = P (B)$
Equivalently, using the formula of conditional probability, we can say:
$P (A) P (A \cap B) = P (A ∣ B) = \frac{P ( A \cap B )}{P ( B )} = P (A) \cdot P (B)$

EXAMPLE

$Ω$ = deck of 52 cards
$A$ = card is spade → $P (A) = \frac{1}{4}$
$B$ = card is queen → $P (B) = \frac{1}{13}$
$P (A \cap B) = P (A) P (B) = \frac{1}{52}$
The probability of getting a spade or queen doesn’t change if we restrict ourselves to the set $A$ or $B$ :

Link to original

Two dice are rolled. $A$ … one dice is a 3. $B$ … the sum of the dice 6. What’s $P (A ∣ B)$ ?

→ $P (A ∣ B) = \frac{1}{5}$ , since there’s one out of the 5 possible events (given C), where either of the dices is a 3.

chain rule of probability

Chain rule of probability

By simply rearranging the formula for conditional probability, we get:
$P (A \cap B) = P (A ∣ B) \cdot P (B)$
… for $A$ and $B$ to happen, $B$ has to happen, and then $A$ has to happen given that $B$ has happened.

Link to original

law of total probability

Law of total probability

Given a partition of the sample space $Ω$ into $n$ disjoint events ( $Ω = ⋃_{i = 1}^{n} B_{i}, B_{i} \cap B_{j} = \emptyset$ ), the probability of an event $A$ is the sum of the probabilities of $A$ given each of the $B_{i}$ , weighted by the probability of each $B_{i}$ :
$P (A) = i = 1 \sum n P (A ∣ B_{i}) \cdot P (B_{i})$

$P (A) = P (A ∣ B) P (B) + P (A ∣ B^{c}) P (B^{c})$

Link to original

What’s this again

$P = \frac{( b A ) ( b B )}{( n N )}$
Where $N = A + B$ is the total number of items, of which $A$ are of one type and $B$ are of another type.
We are sampling $n, a, b$ of these, and the multinomial distribution gives the probability of how likely it is to sample this many items from $A$ and from $B$ .

Introduction (old and bad)

Let $π (n)$ denote the nth digit of $π$ .
Consider the quanity (number) of $d = π (3)$ .
Is it even or odd? $d = 4$ → even.
What about $π (1 0^{1000})$ ?
There are two approaches to probability:

“d is even with probability $0.5$ ”
1. Probability: Representing uncertainty about certain values.
2. Baysian approach. Common in Machine Learning
“d is even with probablility 0 or 1 but I don’t know which”
4. Probability: Mathematically defineable thing about frequencies
5. More common, esp. in rigorous mathematical theory.
Important points:
The total probability is alwys 1 $\int_{X} p (x) d x = 1$ We care about indepence: $p (x, y) = p (x) \cdot p (y)$ We care about expectation: The probabiliy times some other function (e.g. darts player points and dart positions) $E_{p} [f] = \int_{X} f (x) p (x) d x$ Wandb YT

References

mathematics

Max Wolf's Second Brain

Explorer

probability

conditional probability

independent

chain rule of probability

law of total probability

Introduction (old and bad)

References

Graph View

Backlinks