Steve for a 🐐ed intro.
Probability
“Laplace’s Rule”:
… number of events where occurs / number of ways can happen.
… number of possible events.
→
Flip a fair coin twice. What’s the probability of at least one heads occuring?
So probability is just combinatorics (counting combinations).
Roll two dice. What is the probability that at least one die is a 5?
By tedious counting:
… or we take the problem apart by looking at the complement of : Either the first die is a 5 or it’s not and the second die is a 5.
Visually, it’s easy to see the relation to binomial coefficients (→ probability of getting a 5 on at least one die is the same as the probability of getting a 5 on exactly one die) and how probabilities stack up.
Probabilistic models are not just useful for truly random things, but also for things that are too complex to model exactly.
If there are people in a room, how large does need to be for at least chance of at least people sharing a birthday?
Each person has to compare with each other person , but this won’t help us get to an exact probability, as there’s over-counting, etc.
, where is the number of people with the same birthday.
The expression on the left is very hard to compute as again, you need to keep track of all the combinations, overlaps, …
So for the right expression (the complement), we just need to divide the number of unique birthdays (sampling without replacement) by the possible number of birthdays:
Properties of probability
The probability function is a map from subsets of the sample space (the set of all possible outcomes of a random experiment) to the real numbers:
… the probability of something happening is 1.
… the probability of nothing happening is 0.
… probability of an event is always non-negative.
… a subset of events has a smaller probability than the set it’s a subset of.
… the probability of not happening. Also sometimes denoted as (surprise, but without the log).
If are disjoint (can’t occur at the same time), the probability of either happening is the sum of the probability of each happening:
For non-disjoint events, we need to subtract the intersection, so we don’t count it twice (inclusion-exclusion principle).
The “counting” from earlier is just asking about the relative sizes of sets; proportions:
“ and happened”: “ or happened”:
Now it’s also clearer what happens when we multiply or add probabilities:
With the caveat of overcounting for non-disjoint events and the union, and for the intersection: If they are not independent, we need to use the chain rule of probability (see below).
conditional probability
Link to originalConditional probability
are two events (outcomes) of a random experiment. The probability of an event given that another event has occurred is:
If we know that has occured, becomes the new :
→
In this case, becomes a lot more likely, as it occupies a larger fraction of the sample space than it did before.
independent
Independent events
Two events and are independent if knowing gives no extra information about , and vice versa:
Equivalently, using the formula of conditional probability, we can say:
Link to originalEXAMPLE
= deck of 52 cards
= card is spade →
= card is queen →
The probability of getting a spade or queen doesn’t change if we restrict ourselves to the set or :
Two dice are rolled. … one dice is a 3. … the sum of the dice 6. What’s ?
→ , since there’s one out of the 5 possible events (given C), where either of the dices is a 3.
chain rule of probability
Link to originalChain rule of probability
By simply rearranging the formula for conditional probability, we get:
… for and to happen, has to happen, and then has to happen given that has happened.
law of total probability
Law of total probability
Given a partition of the sample space into disjoint events (), the probability of an event is the sum of the probabilities of given each of the , weighted by the probability of each :
Link to original
What’s this again
Where is the total number of items, of which are of one type and are of another type.
We are sampling of these, and the multinomial distribution gives the probability of how likely it is to sample this many items from and from .
Introduction (old and bad)
Let denote the nth digit of .
Consider the quanity (number) of .
Is it even or odd? → even.
What about ?
There are two approaches to probability:
- “d is even with probability ”
- Probability: Representing uncertainty about certain values.
- Baysian approach. Common in Machine Learning
- “d is even with probablility 0 or 1 but I don’t know which”
4. Probability: Mathematically defineable thing about frequencies
5. More common, esp. in rigorous mathematical theory.
Important points:
The total probability is alwys 1 We care about indepence: We care about expectation: The probabiliy times some other function (e.g. darts player points and dart positions) Wandb YT