A probability distribution is a function that provides the probabilities of occurrence of different possible outcomes in an experiment / of a random variable at a particular value , given the parameters .
In simpler terms, it is a model of the likelihood of an event/outcome/value, given some parameters that characterize a system:
The sum of the probabilities in the distribution must be equal to 1:
Or for the continuous case:
rework this note after done with measure theory perspective
Wrong:
Probability distributions are classified into two main types:
probability density function (pdf) for continuous variables
probability mass function (pmf) for discrete variables.
Formally:
So we get the probability density function/probability density function by taking the derivative of the cumulative distribution function:
Some common distributions:
bernoulli distributions model binary outcomes.
binomial distributions are used to model the number of successes in a fixed number of trials.
normal distributions arise in many natural processes; as you add up many random variables, their sum tends to follow a normal distribution, even if they don’t follow a normal distribution individually (Central Limit Theorem).
poisson distributions are used to model the number of events occurring in a fixed interval of time or space; rare events.
exponential distributions are model the probability of waiting times between poisson events.
multinomial distribution is a generalization of the binomial distribution to more than two categories.
geometric distribution
cauchy distribution
pareto distribution
…
machine learning is essentiall learning unknown probability distributions from data.
References
https://github.com/google-deepmind/distrax/tree/main/distrax/_src/distributions