Self-Supervised Exploration via Disagreement

06/10/2019
by   Deepak Pathak, et al.
0

Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized. This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github.io/exploration-by-disagreement/

READ FULL TEXT
research
09/26/2020

SEMI: Self-supervised Exploration via Multisensory Incongruity

Efficient exploration is a long-standing problem in reinforcement learni...
research
07/15/2020

Active World Model Learning with Progress Curiosity

World models are self-supervised predictive models of how the world evol...
research
05/15/2017

Curiosity-driven Exploration by Self-supervised Prediction

In many real-world scenarios, rewards extrinsic to the agent are extreme...
research
10/23/2020

CLOUD: Contrastive Learning of Unsupervised Dynamics

Developing agents that can perform complex control tasks from high dimen...
research
12/02/2021

SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

In this paper, we explore how we can build upon the data and models of I...
research
10/29/2020

Emergence of Spatial Coordinates via Exploration

Spatial knowledge is a fundamental building block for the development of...
research
10/23/2022

Learning General World Models in a Handful of Reward-Free Deployments

Building generally capable agents is a grand challenge for deep reinforc...

Please sign up or login with your details

Forgot password? Click here to reset