Multi-Agent Advantage Decomposition Theorem
Let be a permutation of agents. For any joint observation and joint action :
The total advantage of a joint action can be decomposed into a sum of individual advantages, where each agent’s advantage is conditioned on the actions chosen by previous agents in the permutation.
This enables sequential decision making.
Agent goes first, picks an action with aiming for positive advantage
Agent , knowing , chooses for positive
Agent , knowing , …Instead of searching the entire joint action space , we can search each agent’s action space separately .
→ Each agent can make decisions based information about what others have done, incrementally improving actions.
→ The computational savings get more dramatic as you add more agents