See also: cross-entropy
(Categorical) cross-entropy loss are used for classification tasks, in conjunction with softmax:
… sample
… label / output index
… target values
… predicted values
For one-hot encodings, cross-entropy loss simplifies to Categorical Cross-Entropy, aka negative log-likelihood loss:
… index for correct class.
Unlike the negative log-likelihood loss, which doesn’t punish based on prediction confidence, Cross-Entropy punishes incorrect but confident predictions, as well as correct but less confident predictions. 1
Cross entropy indicates the distance between what the model believes the output distribution should be, and what the original distribution really is.
It is defined as, Cross entropy measure is a widely used alternative of squared error. It is used when node activations can be understood as representing the probability that each hypothesis might be true, i.e. when the output is a probability distribution. Thus it is used as a loss function in neural networks which have softmax activations in the output layer. 2
nn.CrossentropyLoss <=> nn.LogSoftmax & nn.NLLoss
nn.BCEWithLogitsLoss <=> nn.Sigmoid && nn.BCELoss
when does it not perform well?
- strong class imbalence (just becomes more confident in predicting the majority class and neglects the minority class)
- fails to differentiate between easy and hard samples. Hard example = model makes significant errors; easy example = straightforward to classify. CE doesn’t allocate more attention to hard samples
Mitigations:
addressing class imbalance: balanced CE loss
adding a weighting factor for each class resolves the issue with class imbalances:
predicted probability of class (from softmax, …)
weight for class
is usually calculated as the inverse of the class distribution like so:
(if the dataset is very large, take a random sample that fits into memory)
addressing hard-negatives: focal loss
References
https://towardsdatascience.com/cross-entropy-negative-log-likelihood-and-all-that-jazz-47a95bd2e81