It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum Generation

02/22/2022
by   Yuqing Du, et al.
0

We are interested in training general-purpose reinforcement learning agents that can solve a wide variety of goals. Training such agents efficiently requires automatic generation of a goal curriculum. This is challenging as it requires (a) exploring goals of increasing difficulty, while ensuring that the agent (b) is exposed to a diverse set of goals in a sample efficient manner and (c) does not catastrophically forget previously solved goals. We propose Curriculum Self Play (CuSP), an automated goal generation framework that seeks to satisfy these desiderata by virtue of a multi-player game with four agents. We extend the asymmetric curricula learning in PAIRED (Dennis et al., 2020) to a symmetrized game that carefully balances cooperation and competition between two off-policy student learners and two regret-maximizing teachers. CuSP additionally introduces entropic goal coverage and accounts for the non-stationary nature of the students, allowing us to automatically induce a curriculum that balances progressive exploration with anti-catastrophic exploitation. We demonstrate that our method succeeds at generating an effective curricula of goals for a range of control tasks, outperforming other methods at zero-shot test-time generalization to novel out-of-distribution goals.

READ FULL TEXT
research
06/14/2022

Stein Variational Goal Generation For Reinforcement Learning in Hard Exploration Problems

Multi-goal Reinforcement Learning has recently attracted a large amount ...
research
10/07/2021

Situated Dialogue Learning through Procedural Environment Generation

We teach goal-driven agents to interactively act and speak in situated e...
research
01/13/2021

Asymmetric self-play for automatic goal discovery in robotic manipulation

We train a single, goal-conditioned policy that can solve many robotic m...
research
09/27/2019

Automated curricula through setter-solver interactions

Reinforcement learning algorithms use correlations between policies and ...
research
02/18/2020

Generating Automatic Curricula via Self-Supervised Active Domain Randomization

Goal-directed Reinforcement Learning (RL) traditionally considers an age...
research
12/28/2020

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

Dialogue policy learning based on reinforcement learning is difficult to...
research
02/17/2021

Automated Curriculum Learning for Embodied Agents: A Neuroevolutionary Approach

We demonstrate how an evolutionary algorithm can be extended with a curr...

Please sign up or login with your details

Forgot password? Click here to reset