A probability distribution is a function that provides the probabilities of occurrence of different possible outcomes in an experiment / of a random variable at a particular value , given the parameters .
In simpler terms, it is a model of the likelihood of an event/outcome/value, given some parameters that characterize a system:

The sum of the probabilities in the distribution must be equal to 1:

Or for the continuous case:

rework this note after done with measure theory perspective

Wrong:
Probability distributions are classified into two main types:
probability density function (pdf) for continuous variables
probability mass function (pmf) for discrete variables.

Formally:

So we get the probability density function/probability density function by taking the derivative of the cumulative distribution function:

Some common distributions:
bernoulli distributions model binary outcomes.
binomial distributions are used to model the number of successes in a fixed number of trials.
normal distributions arise in many natural processes; as you add up many random variables, their sum tends to follow a normal distribution, even if they don’t follow a normal distribution individually (Central Limit Theorem).
poisson distributions are used to model the number of events occurring in a fixed interval of time or space; rare events.
exponential distributions are model the probability of waiting times between poisson events.
multinomial distribution is a generalization of the binomial distribution to more than two categories.
geometric distribution
cauchy distribution
pareto distribution

machine learning is essentiall learning unknown probability distributions from data.

References

normal dist statquest
article

https://github.com/google-deepmind/distrax/tree/main/distrax/_src/distributions