Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs

07/21/2023
by   Jiayu Chen, et al.
0

Covering skill (a.k.a., option) discovery has been developed to improve the exploration of RL in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. Given that joint state space grows exponentially with the number of agents in multi-agent systems, existing researches still relying on single-agent skill discovery either become prohibitive or fail to directly discover joint skills that improve the connectivity of the joint state space. In this paper, we propose multi-agent skill discovery which enables the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector using the Laplacian spectrum of individual agents' transition graphs. Further, considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method by estimating eigenfunctions through NN-based representation learning techniques. The evaluation on multi-agent tasks built with simulators like Mujoco, shows that the proposed algorithm can successfully identify multi-agent skills, and significantly outperforms the state-of-the-art. Codes are available at: https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP.

READ FULL TEXT

page 7

page 19

research
01/20/2022

Multi-agent Covering Option Discovery based on Kronecker Product of Factor Graphs

Covering option discovery has been developed to improve the exploration ...
research
10/07/2022

Multi-agent Deep Covering Option Discovery

The use of options can greatly accelerate exploration in reinforcement l...
research
06/07/2020

Skill Discovery of Coordination in Multi-agent Reinforcement Learning

Unsupervised skill discovery drives intelligent agents to explore the un...
research
12/01/2022

ODPP: A Unified Algorithm Framework for Unsupervised Option Discovery based on Determinantal Point Process

Learning rich skills through temporal abstractions without supervision o...
research
03/21/2022

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

In reinforcement learning, the graph Laplacian has proved to be a valuab...
research
07/12/2021

Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing

The Laplacian representation recently gains increasing attention for rei...
research
05/02/2020

Learning Model Predictive Control for Competitive Autonomous Racing

The goal of this thesis is to design a learning model predictive control...

Please sign up or login with your details

Forgot password? Click here to reset