Fig: – The focal loss down weights easy examples with a factor of  (1-  pt)γ 1

Range:
Use case: Modulates cross-entropy loss by weighting well classified classes less, putting more emphasis on hard-negatives (bigger = stronger effect). Useful for inbalanced datasets, outperforms balanced CE loss.

(1) When an example is misclassified, and is small, the modulating factor is near 1 and the loss is unaffected. As , the factor goes to 0 and the loss for well-classified examples is down weighted.

(2) The focusing parameter γ smoothly adjusts the rate at which easy examples are down weighted. When γ = 0, Focal loss is equivalent to cross-entropy, and as γ is increased, the effect of the modulating factor is likewise increased.

The loss can also be (and is in practice) -blanced:

References

Paper: Focal Loss for Dense Object Detection

https://medium.com/data-science-ecom-express/focal-loss-for-handling-the-issue-of-class-imbalance-be7addebd856

FAIR implementation

With label smoothing:
GPT
Keras implementationis wrong?

Footnotes

  1. https://www.analyticsvidhya.com/blog/2020/08/a-beginners-guide-to-focal-loss-in-object-detection/