i … ith sample k … class index q … true distribution (soft labels, one-hot, …) p … predicted probability distribution (from softmax, …) θ … model parameters x … input features
→ p^i=softmax(fθ(xi)),ℓCE(i)=−yiTlogp^i
i … ith sample k … index of the correct class y … true label (one-hot, all probability mass is on the true class) p … predicted probability distribution (from softmax, …) θ … model parameters x … input features
→ p^i=softmax(fθ(xi)),ℓCCE(i)=−logp^i,yi
strong class imbalence (just becomes more confident in predicting the majority class and neglects the minority class)
fails to differentiate between easy and hard samples. Hard example = model makes significant errors; easy example = straightforward to classify. CE doesn’t allocate more attention to hard samples
Mitigations:
Addressing class imbalance: balanced CE loss
Adding a weighting factor for each class resolves the issue with class imbalances:
CE(pt)=−αtlog(pt)
pi… predicted probability of class (from softmax, …) αt… weight for class t α is usually calculated as the inverse of the class distribution like so:
αt=number of classes×class counttnumber of samples
sample_labels = torch.cat([dataset[i][1] for i in len(dataset)]) unique, counts = torch.unique(sample_labels, return_counts=True) class_weights = counts.sum() / (len(unique) * counts)
(if the dataset is very large, take a random sample that fits into memory)