Multi-Agent Advantage Decomposition Theorem

Let be a permutation of agents. For any joint observation and joint action :

The total advantage of a joint action can be decomposed into a sum of individual advantages, where each agent’s advantage is conditioned on the actions chosen by previous agents in the permutation.

Instead of searching the entire joint action space , we can search each agent’s action space separately .
→ Each agent can make decisions based information about what others have done, incrementally improving actions.
→ The computational savings get more dramatic as you add more agents