Cross-Entropy Loss

The cross-entropy loss is often used for classification tasks, in conjunction with softmax:

sample
… class index
… true distribution (soft labels, one-hot, …)
… predicted probability distribution (from softmax, …)
… model parameters
… input features

Categorical Cross-Entropy Loss

For one-hot encodings, the cross-entropy loss simplifies to Categorical Cross-Entropy, aka negative log-likelihood loss:

sample
… index of the correct class
… true label (one-hot, all probability mass is on the true class)
… predicted probability distribution (from softmax, …)
… model parameters
… input features

Link to original

nn.CrossentropyLoss <=> nn.LogSoftmax & nn.NLLoss
nn.BCEWithLogitsLoss <=> nn.Sigmoid & nn.BCELoss

center

When does it not perform well?

  • strong class imbalence (just becomes more confident in predicting the majority class and neglects the minority class)
  • fails to differentiate between easy and hard samples. Hard example = model makes significant errors; easy example = straightforward to classify. CE doesn’t allocate more attention to hard samples

Mitigations:

Addressing class imbalance: balanced CE loss

Adding a weighting factor for each class resolves the issue with class imbalances:

predicted probability of class (from softmax, …)
weight for class
is usually calculated as the inverse of the class distribution like so:

sample_labels = torch.cat([dataset[i][1] for i in len(dataset)])  
unique, counts = torch.unique(sample_labels, return_counts=True)  
class_weights = counts.sum() / (len(unique) * counts)

(if the dataset is very large, take a random sample that fits into memory)

Addressing hard-negatives: focal loss

https://towardsdatascience.com/cross-entropy-negative-log-likelihood-and-all-that-jazz-47a95bd2e81
classification