negative log-likelihood loss

Categorical Cross-Entropy Loss

For one-hot encodings, the cross-entropy loss simplifies to Categorical Cross-Entropy, aka negative log-likelihood loss:
$ℓ_{CCE}^{(i)} = - k \sum y_{i, k} lo g (p_{θ} (k ∣ x_{i})) = - lo g (p_{θ} (y_{i, k} ∣ x_{i}))$
$i$ … $i^{t h}$ sample
$k$ … index of the correct class
$y$ … true label (one-hot, all probability mass is on the true class)
$p$ … predicted probability distribution (from softmax, …)
$θ$ … model parameters
$x$ … input features
→ $\overset{p}{^}_{i} = softmax (f_{θ} (x_{i})), ℓ_{CCE}^{(i)} = - lo g \overset{p}{^}_{i, y_{i}}$

center
$lo g \frac{1}{p} = - lo g p$ , see surprise:
center

References

Understanding softmax and the negative log-likelihood

cross-entropy
classification

Max Wolf's Second Brain

Explorer

negative log-likelihood loss

References

Graph View

Backlinks