Communication-Efficient Cooperative Multi-Agent PPO via Regulated Segment Mixture in Internet of Vehicles

08/08/2023
by   Xiaoxue Yu, et al.
0

Multi-Agent Reinforcement Learning (MARL) has become a classic paradigm to solve diverse, intelligent control tasks like autonomous driving in Internet of Vehicles (IoV). However, the widely assumed existence of a central node to implement centralized federated learning-assisted MARL might be impractical in highly dynamic scenarios, and the excessive communication overheads possibly overwhelm the IoV system. Therefore, in this paper, we design a communication efficient cooperative MARL algorithm, named RSM-MAPPO, to reduce the communication overheads in a fully distributed architecture. In particular, RSM-MAPPO enhances the multi-agent Proximal Policy Optimization (PPO) by incorporating the idea of segment mixture and augmenting multiple model replicas from received neighboring policy segments. Afterwards, RSM-MAPPO adopts a theory-guided metric to regulate the selection of contributive replicas to guarantee the policy improvement. Finally, extensive simulations in a mixed-autonomy traffic control scenario verify the effectiveness of the RSM-MAPPO algorithm.

READ FULL TEXT

page 1

page 2

research
03/08/2021

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Reinforcement learning in cooperative multi-agent settings has recently ...
research
01/20/2023

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

Multi-Agent RL or MARL is one of the complex problems in Autonomous Driv...
research
03/02/2021

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement ...
research
12/03/2022

DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement Learning

Communication is supposed to improve multi-agent collaboration and overa...
research
11/07/2021

Coordinated Proximal Policy Optimization

We present Coordinated Proximal Policy Optimization (CoPPO), an algorith...
research
01/05/2021

Neurosymbolic Transformers for Multi-Agent Communication

We study the problem of inferring communication structures that can solv...
research
05/22/2021

6G V2X Technologies and Orchestrated Sensing for Autonomous Driving

6G technology targets to revolutionize the mobility industry by revampin...

Please sign up or login with your details

Forgot password? Click here to reset