Reinforcement Learning with Action-Free Pre-Training from Videos

03/25/2022
by   Younggyo Seo, et al.
0

Recent unsupervised pre-training methods have shown to be effective on language and vision domains by learning useful representations for multiple downstream tasks. In this paper, we investigate if such unsupervised pre-training methods can also be effective for vision-based reinforcement learning (RL). To this end, we introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos. Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments. To incorporate additional action inputs during fine-tuning, we introduce a new architecture that stacks an action-conditional latent prediction model on top of the pre-trained action-free prediction model. Moreover, for better exploration, we propose a video-based intrinsic bonus that leverages pre-trained representations. We demonstrate that our framework significantly improves both final performances and sample-efficiency of vision-based RL in a variety of manipulation and locomotion tasks. Code is available at https://github.com/younggyoseo/apv.

READ FULL TEXT

page 3

page 4

page 7

page 17

page 18

research
10/02/2022

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model

Unsupervised reinforcement learning (URL) poses a promising paradigm to ...
research
10/23/2022

Language Model Pre-Training with Sparse Latent Typing

Modern large-scale Pre-trained Language Models (PLMs) have achieved trem...
research
12/16/2021

Unsupervised Reinforcement Learning in Multiple Environments

Several recent works have been dedicated to unsupervised reinforcement l...
research
06/16/2023

ALP: Action-Aware Embodied Learning for Perception

Current methods in training and benchmarking vision models exhibit an ov...
research
04/02/2023

DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks

In this paper, we study masked autoencoder (MAE) pretraining on videos f...
research
02/21/2020

Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning

In many vision-based reinforcement learning (RL) problems, the agent con...
research
02/09/2023

An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning

Unsupervised object-centric representation (OCR) learning has recently d...

Please sign up or login with your details

Forgot password? Click here to reset