The term “score function” is a leftover from the context in which it was first used in (genetic analysis).
Score function
The score function (or score) is the gradient of the log-likelihood function with respect to the parameter vector.
For a single observation, it measures how sensitive the log-likelihood is to small changes in the parameter at that point.
In general, the score function tells us how we should adjust our parameters to increase the likelihood of observing our data.
A positive score suggests increasing would improve the fit, while a negative score suggests decreasing it.
Properties of the score function.
The score function has mean zero under the true parameter value (the parameter that generated our observed data; not an estimate):
This means when evaluated at the true parameter, there is no systematic “push” in any direction - fluctuations cancel out – we’re at the optimum. The variance of this score function is called the fisher information, measuring how much information a random variable carries about the parameter .
Note: A zero gradient is not sufficient for global optimality, but could also be satisfied by local maxima, local minima, saddle points, … unless the parameter space is convex and the likelihood function is concave.