The advantage function measures how much better taking action in state is compared to the average value of that state. Specifically, it is the difference between the Q-value (expected return starting from , taking action ) and the state-value (expected return from state ):