HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach

06/10/2023
by   Shixi Lian, et al.
0

Offline reinforcement learning (ORL) has gained attention as a means of training reinforcement learning models using pre-collected static data. To address the issue of limited data and improve downstream ORL performance, recent work has attempted to expand the dataset's coverage through data augmentation. However, most of these methods are tied to a specific policy (policy-dependent), where the generated data can only guarantee to support the current downstream ORL policy, limiting its usage scope on other downstream policies. Moreover, the quality of synthetic data is often not well-controlled, which limits the potential for further improving the downstream policy. To tackle these issues, we propose HIgh-quality POlicy-DEcoupled (HIPODE), a novel data augmentation method for ORL. On the one hand, HIPODE generates high-quality synthetic data by selecting states near the dataset distribution with potentially high value among candidate states using the negative sampling technique. On the other hand, HIPODE is policy-decoupled, thus can be used as a common plug-in method for any downstream ORL process. We conduct experiments on the widely studied TD3BC and CQL algorithms, and the results show that HIPODE outperforms the state-of-the-art policy-decoupled data augmentation method and most prevalent model-based ORL methods on D4RL benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2022

Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

In many real-world applications, collecting large and high-quality datas...
research
07/12/2021

Fine-Grained AutoAugmentation for Multi-Label Classification

Data augmentation is a commonly used approach to improving the generaliz...
research
07/10/2023

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

In some applications of reinforcement learning, a dataset of pre-collect...
research
12/08/2022

Model-based trajectory stitching for improved behavioural cloning and its applications

Behavioural cloning (BC) is a commonly used imitation learning method to...
research
03/10/2021

S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning

Offline reinforcement learning proposes to learn policies from large col...
research
05/22/2022

Offline Policy Comparison with Confidence: Benchmarks and Baselines

Decision makers often wish to use offline historical data to compare seq...
research
09/10/2023

A supervised generative optimization approach for tabular data

Synthetic data generation has emerged as a crucial topic for financial i...

Please sign up or login with your details

Forgot password? Click here to reset