W.r.t. cross-entropy loss , the perplexity is defined as:

It’s the geometric mean of the inverse predicted probabilities.
→ Perplexity is dimensionless and multiplicative: halving perplexity doubles the geometric mean of the probabilities the model assigned to the true labels.

Effective branching factor

Perplexity: The model is, on average (geometric mean), as confused as if it had to pick uniformly from candidates.

EXAMPLE

For a uniform distribution over candidates (e.g. vocabulary size), we have , so .

The lowest possible perplexity is at , i.e. .