Equivariant MuZero

02/09/2023
by   Andreea Deac, et al.
0

Deep reinforcement learning repeatedly succeeds in closed, well-defined domains such as games (Chess, Go, StarCraft). The next frontier is real-world scenarios, where setups are numerous and varied. For this, agents need to learn the underlying rules governing the environment, so as to robustly generalise to conditions that differ from those they were trained on. Model-based reinforcement learning algorithms, such as the highly successful MuZero, aim to accomplish this by learning a world model. However, leveraging a world model has not consistently shown greater generalisation capabilities compared to model-free alternatives. In this work, we propose improving the data efficiency and generalisation capabilities of MuZero by explicitly incorporating the symmetries of the environment in its world-model architecture. We prove that, so long as the neural networks used by MuZero are equivariant to a particular symmetry group acting on the environment, the entirety of MuZero's action-selection algorithm will also be equivariant to that group. We evaluate Equivariant MuZero on procedurally-generated MiniPacman and on Chaser from the ProcGen suite: training on a set of mazes, and then testing on unseen rotated versions, demonstrating the benefits of equivariance. Further, we verify that our performance improvements hold even when only some of the components of Equivariant MuZero obey strict equivariance, which highlights the robustness of our construction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2017

Deep Reinforcement Learning Boosted by External Knowledge

Recent improvements in deep reinforcement learning have allowed to solve...
research
03/08/2021

Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars

Despite the rich theoretical foundation of model-based deep reinforcemen...
research
07/27/2019

Towards Model-based Reinforcement Learning for Industry-near Environments

Deep reinforcement learning has over the past few years shown great pote...
research
05/03/2022

RLFlow: Optimising Neural Network Subgraph Transformation with World Models

We explored the use of reinforcement learning (RL) agents that can learn...
research
10/31/2014

A Comparison of learning algorithms on the Arcade Learning Environment

Reinforcement learning agents have traditionally been evaluated on small...
research
03/09/2022

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Model-based reinforcement learning algorithms are typically more sample ...
research
05/22/2019

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration

Data efficiency and robustness to task-irrelevant perturbations are long...

Please sign up or login with your details

Forgot password? Click here to reset