Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

06/07/2023
by   Peng Cheng, et al.
0

Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1 outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2022

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Learning effective reinforcement learning (RL) policies to solve real-wo...
research
06/26/2020

Critic Regularized Regression

Offline reinforcement learning (RL), also known as batch RL, offers the ...
research
10/01/2021

Offline Reinforcement Learning with Reverse Model-based Imagination

In offline reinforcement learning (offline RL), one of the main challeng...
research
06/01/2023

Improving Offline RL by Blending Heuristics

We propose Heuristic Blending (HUBL), a simple performance-improving tec...
research
04/12/2022

Offline Distillation for Robot Lifelong Learning with Imbalanced Experience

Robots will experience non-stationary environment dynamics throughout th...
research
07/06/2023

Offline Reinforcement Learning with Imbalanced Datasets

The prevalent use of benchmarks in current offline reinforcement learnin...
research
01/30/2023

Winning Solution of Real Robot Challenge III

This report introduces our winning solution of the real-robot phase of t...

Please sign up or login with your details

Forgot password? Click here to reset