“average local curvature of log-likelihood per observation” (i.e., expected squared score);
or, informally: “how quickly the data punish small parameter mistakes, per sample.”
“FIM tells you how sharply the model reacts to parameter nudges (geometry/curvature).”
Motivation
Link to originalThe covariance of the score function is called the fisher information
It gives us a sense of the uncertainty of our estimate, i.e. measuring how much information our data provides about the parameter :
Fisher Information
The Fisher information quantifies how much information a random variable carries about an unknown parameter that describes its probability distribution. For a PDF , it is defined as:
where the expectation is taken with respect to .
Fisher information measures the curvature of the log-likelihood function around the true parameter value.
A higher Fisher information indicates:
- The distribution is more sensitive to changes in the parameter
- We can estimate more precisely from observations
- The likelihood function has a sharper peak around the true value
The first form of the definition shows that fisher information is the variance of the score function.
The second form of the definition (using second derivative) directly shows this connection to curvature, since the second derivative measures how quickly the slope changes – a more negative second derivative indicates a sharper peak in the log-likelihood.
Cramér-Rao bound: The variance of any unbiased estimator is bounded below by the inverse of the Fisher information:
→ Fisher information directly determines the best possible precision we can achieve when estimating parameters from data.
For a normal distribution , the Fisher information with respect to the mean is:
→ We can estimate the mean more precisely (higher Fisher information) when the variance is smaller.
FIM … Fisher information Matrix
Fisher information is the covariance of the score function
Proof + relation to hessian