The cross-entropy is the average surprise (entropy) you get observing a random variable governed by probability distribution , while believing in its model .

Where is (cross-)entropy, is the surprise of state , is the probability of state , or how common it is.
represents data/observations/a measured probability distribution, represents a theory/model/description/approximation of .

CE can tell you how good your model is.
If you model is perfect, i.e. , then the crossentropy is simply equal to the entropy (uncertainty) of the true distribution: .

The cross-entropy can never be lower than the entropy of the generating distribution:

and are not interchangable

E.g.: If you believe a coin is fair (0.5/0.5), but it is rigged (0.99/0.01), then the CE is:
If you believe it is rigged when it is actually fair, then the CE is:
In the second case, the entropy is much larger, as half of the time you are extremely surprsied to see tails, this extreme surprise dominates the average surprise.

The KL-divergence subtracts the uncertainty (entropy) about the true distribution from cross-entropy, leaving us with a measure for the similarity of these distributions.