Cross-entropy
The cross-entropy is the average surprise (entropy) you get observing a random variable governed by probability distribution , while believing in a model .
Where is the entropy of , is the KL-divergence between and .
represents data/observations/a measured probability distribution,
represents a theory/model/description/approximation of .
CE can tell you how good your model is.
If you model is perfect, i.e. , then the crossentropy is simply equal to the entropy (uncertainty) of the true distribution.
The cross-entropy can never be lower than the entropy of the generating distribution:
and are not interchangable
E.g.: If you believe a coin is fair (0.5/0.5), but it is rigged (0.99/0.01), then the CE is:
If you believe it is rigged when it is actually fair, then the CE is:
In the second case, the entropy is much larger, as half of the time you are extremely surprsied to see tails, this extreme surprise dominates the average surprise.