year: 2021
paper: https://arxiv.org/pdf/2103.01955
website: https://sites.google.com/view/mappo
code: https://github.com/marlbenchmark/on-policy
connections: PPO, MARL
MAPPO objective
The objective of MAPPO is equivalent to that of PPO, but equipping each agent with one shared set of parameters, i.e. . It uses the agent’s combined trajectories for the shared policy’s update.
So the objective for optimizing the policy parameters at iteration is:The constraint on the joint policy space imposed by shared parameters can lead to an exponentially-worse sub-optimal outcome (details; page 4).