Multi-Agent Advantage Decomposition Theorem

Multi-Agent Advantage Decomposition Theorem

Let $i_{1 : n}$ be a permutation of agents. For any joint observation $o$ and joint action $a$ :
$A_{π}^{i_{1 : n}} (o, a^{i_{1 : n}}) = m = 1 \sum n A_{π}^{i_{m}} (o, a^{i_{1 : m - 1}}, a^{i_{m}})$
The total advantage of a joint action can be decomposed into a sum of individual advantages, where each agent’s advantage is conditioned on the actions chosen by previous agents in the permutation.

This enables sequential decision making.

Agent $i_{1}$ goes first, picks an action with $a^{i_{1}}$ aiming for positive advantage $A_{π}^{i_{1}} (o, a^{i_{1}}) > 0$
Agent $i_{2}$ , knowing $a^{i_{1}}$ , chooses $a^{i_{2}}$ for positive $A_{π}^{i_{2}} (o, a^{i_{1}}, a^{i_{2}}) > 0$
Agent $i_{3}$ , knowing $(a^{i_{1}}, a^{i_{2}})$ , …

Instead of searching the entire joint action space $\prod_{i = 1}^{n} ∣ A^{i} ∣$ , we can search each agent’s action space separately $\sum_{i = 1}^{n} ∣ A^{i} ∣$ .
→ Each agent can make decisions based information about what others have done, incrementally improving actions.
→ The computational savings get more dramatic as you add more agents

Max Wolf's Second Brain

Explorer

Multi-Agent Advantage Decomposition Theorem

Graph View

Backlinks