Playing hard exploration games by watching YouTube

05/29/2018
by   Yusuf Aytar, et al.
2

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

READ FULL TEXT

page 2

page 3

page 7

research
04/15/2021

Self-Supervised Exploration via Latent Bayesian Surprise

Training with Reinforcement Learning requires a reward function that is ...
research
07/03/2020

Learning intuitive physics and one-shot imitation using state-action-prediction self-organizing maps

Human learning and intelligence work differently from the supervised pat...
research
02/14/2020

Never Give Up: Learning Directed Exploration Strategies

We propose a reinforcement learning agent to solve hard exploration game...
research
06/26/2018

Adversarial Exploration Strategy for Self-Supervised Imitation Learning

We present an adversarial exploration strategy, a simple yet effective i...
research
11/05/2018

Contingency-Aware Exploration in Reinforcement Learning

This paper investigates whether learning contingency-awareness and contr...
research
05/29/2018

Observe and Look Further: Achieving Consistent Performance on Atari

Despite significant advances in the field of deep Reinforcement Learning...
research
02/19/2019

Learning to Generalize from Sparse and Underspecified Rewards

We consider the problem of learning from sparse and underspecified rewar...

Please sign up or login with your details

Forgot password? Click here to reset