Markov property
The probability of the next state and reward only depends on the current state and action.
The reward does not depend on the history of states and actions.
Markov property
The probability of the next state and reward only depends on the current state and action.
The reward does not depend on the history of states and actions.