The term “score function” is a leftover from the context in which it was first used in (genetic analysis).

Score function

The score function (or score) is the gradient of the log-likelihood function with given a parameter vector and observed data :

It tells us which direction in parameter space we should adjust to increase the likelihood of observing our data .
A positive score suggests increasing would improve the fit, while a negative score suggests decreasing it.

The score function has mean zero under the true parameter value

If is the parameter that generated our observed data, not an estimate:

… we are at a stationary point of the log-likelihood function.
Note: A zero gradient is not sufficient for global optimality, but could also be satisfied by local maxima, local minima, saddle points, … unless the parameter space is convex and the likelihood function is concave.

The expected value of the score function w.r.t the model is zero.

Line 3: Take the derivative of the log
Line 5: The is a PDF

The covariance of the score function is called the fisher information

It gives us a sense of the uncertainty of our estimate, i.e. measuring how much information our data provides about the parameter :


https://agustinus.kristia.de/blog/fisher-information/