PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

by   Tao Yu, et al.

Learning good feature representations is important for deep reinforcement learning (RL). However, with limited experience, RL often suffers from data inefficiency for training. For un-experienced or less-experienced trajectories (i.e., state-action sequences), the lack of data limits the use of them for better feature learning. In this work, we propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning. Specifically, PlayVirtual predicts future states based on the current state and action by a dynamics model and then predicts the previous states by a backward dynamics model, which forms a trajectory cycle. Based on this, we augment the actions to generate a large amount of virtual state-action trajectories. Being free of groudtruth state supervision, we enforce a trajectory to meet the cycle consistency constraint, which can significantly enhance the data efficiency. We validate the effectiveness of our designs on the Atari and DeepMind Control Suite benchmarks. Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.



There are no comments yet.


page 1

page 2

page 3

page 4


Learning State Representations via Retracing in Reinforcement Learning

We propose learning via retracing, a novel self-supervised approach for ...

Accelerating Representation Learning with View-Consistent Dynamics in Data-Efficient Reinforcement Learning

Learning informative representations from image-based observations is of...

Return-Based Contrastive Representation Learning for Reinforcement Learning

Recently, various auxiliary tasks have been proposed to accelerate repre...

Dynamics-aware Embeddings

In this paper we consider self-supervised representation learning to imp...

Data-Driven Reinforcement Learning for Virtual Character Animation Control

Virtual character animation control is a problem for which Reinforcement...

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

In many environments only a tiny subset of all states yield high reward....

Systematic Generalization for Predictive Control in Multivariate Time Series

Prior work has focused on evaluating the ability of neural networks to r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.