Reward-Free Curricula for Training Robust World Models

06/15/2023
by   Marc Rigter, et al.
0

There has been a recent surge of interest in developing generally-capable agents that can adapt to new tasks without additional training in the environment. Learning world models from reward-free exploration is a promising approach, and enables policies to be trained using imagined experience for new tasks. Achieving a general agent requires robustness across different environments. However, different environments may require different amounts of data to learn a suitable world model. In this work, we address the problem of efficiently learning robust world models in the reward-free setting. As a measure of robustness, we consider the minimax regret objective. We show that the minimax regret objective can be connected to minimising the maximum error in the world model across environments. This informs our algorithm, WAKER: Weighted Acquisition of Knowledge across Environments for Robustness. WAKER selects environments for data collection based on the estimated error of the world model for each environment. Our experiments demonstrate that WAKER outperforms naive domain randomisation, resulting in improved robustness, efficiency, and generalisation.

READ FULL TEXT

page 7

page 9

page 18

page 23

page 24

page 25

research
06/11/2020

Adaptive Reward-Free Exploration

Reward-free exploration is a reinforcement learning setting recently stu...
research
07/19/2022

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

Robust reinforcement learning (RL) considers the problem of learning pol...
research
12/03/2020

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

A wide range of reinforcement learning (RL) problems - including robustn...
research
11/04/2019

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

While using shaped rewards can be beneficial when solving sparse reward ...
research
02/07/2022

The Importance of Non-Markovianity in Maximum State Entropy Exploration

In the maximum state entropy exploration framework, an agent interacts w...
research
06/07/2018

Simplifying Reward Design through Divide-and-Conquer

Designing a good reward function is essential to robot planning and rein...
research
05/18/2021

Fixed β-VAE Encoding for Curious Exploration in Complex 3D Environments

Curiosity is a general method for augmenting an environment reward with ...

Please sign up or login with your details

Forgot password? Click here to reset