See also: cross-entropy

(Categorical) cross-entropy loss are used for classification tasks, in conjunction with softmax:

sample
… label / output index
… target values
… predicted values

For one-hot encodings, cross-entropy loss simplifies to Categorical Cross-Entropy, aka negative log-likelihood loss:

… index for correct class.

Unlike the negative log-likelihood loss, which doesn’t punish based on prediction confidence, Cross-Entropy punishes incorrect but confident predictions, as well as correct but less confident predictions. 1

Cross entropy indicates the distance between what the model believes the output distribution should be, and what the original distribution really is.

It is defined as,  Cross entropy measure is a widely used alternative of squared error. It is used when node activations can be understood as representing the probability that each hypothesis might be true, i.e. when the output is a probability distribution. Thus it is used as a loss function in neural networks which have softmax activations in the output layer. 2

nn.CrossentropyLoss <=> nn.LogSoftmax & nn.NLLoss
nn.BCEWithLogitsLoss <=> nn.Sigmoid && nn.BCELoss

center

when does it not perform well?

  • strong class imbalence (just becomes more confident in predicting the majority class and neglects the minority class)
  • fails to differentiate between easy and hard samples. Hard example = model makes significant errors; easy example = straightforward to classify. CE doesn’t allocate more attention to hard samples

Mitigations:

addressing class imbalance: balanced CE loss

adding a weighting factor for each class resolves the issue with class imbalances:

predicted probability of class (from softmax, …)
weight for class

is usually calculated as the inverse of the class distribution like so:

sample_labels = torch.cat([dataset[i][1] for i in len(dataset)])  
unique, counts = torch.unique(sample_labels, return_counts=True)  
class_weights = counts.sum() / (len(unique) * counts)

(if the dataset is very large, take a random sample that fits into memory)

addressing hard-negatives: focal loss

References

https://towardsdatascience.com/cross-entropy-negative-log-likelihood-and-all-that-jazz-47a95bd2e81

Footnotes

  1. https://neptune.ai/blog/pytorch-loss-functions

  2. https://deepnotes.io/softmax-crossentropy