Markov property

The probability of the next state and reward only depends on the current state and action.
The reward does not depend on the history of states and actions.