The term “score function” is a leftover from the context in which it was first used in (genetic analysis).
Score function
The score function (or score) is the gradient of the log-likelihood function with given a parameter vector and observed data :
It tells us which direction in parameter space we should adjust to increase the likelihood of observing our data .
A positive score suggests increasing would improve the fit, while a negative score suggests decreasing it.
The score function has mean zero under the true parameter value
If is the parameter that generated our observed data, not an estimate:
… we are at a stationary point of the log-likelihood function.
Note: A zero gradient is not sufficient for global optimality, but could also be satisfied by local maxima, local minima, saddle points, … unless the parameter space is convex and the likelihood function is concave.
The expected value of the score function w.r.t the model is zero.
Line 3: Take the derivative of the log
Line 5: The is a PDF
The covariance of the score function is called the fisher information
It gives us a sense of the uncertainty of our estimate, i.e. measuring how much information our data provides about the parameter :