Types of policies
Deterministic policy:
Stochastic policy:
Two major types are categorical policies for discrete action spaces and diagonal gaussian policies for continuous action spaces.
Categorical Policy
Diagonal Gaussian Policy
Diagonal gaussian policies map from observations to mean actions .
There’s two common ways to represent represent the covariance matrix:A parameter vector of log standard deviations , which is not a function of state.
Network layers mapping from states to log standard deviations , may share params with .
We can then just sample from the distribution to get an action.
Log standard deviations are used so we don’t have to constrain the ANN output to be nonnegative, and can simply exponentiate the log outputs to obtain , without loosing anything.