Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

11/21/2022
by   Charles A. Hepburn, et al.
0

In many real-world applications, collecting large and high-quality datasets may be too costly or impractical. Offline reinforcement learning (RL) aims to infer an optimal decision-making policy from a fixed set of data. Getting the most information from historical data is then vital for good performance once the policy is deployed. We propose a model-based data augmentation strategy, Trajectory Stitching (TS), to improve the quality of sub-optimal historical trajectories. TS introduces unseen actions joining previously disconnected states: using a probabilistic notion of state reachability, it effectively `stitches' together parts of the historical demonstrations to generate new, higher quality ones. A stitching event consists of a transition between a pair of observed states through a synthetic and highly probable action. New actions are introduced only when they are expected to be beneficial, according to an estimated state-value function. We show that using this data augmentation strategy jointly with behavioural cloning (BC) leads to improvements over the behaviour-cloned policy from the original dataset. Improving over the BC policy could then be used as a launchpad for online RL through planning and demonstration-guided RL.

READ FULL TEXT
research
12/08/2022

Model-based trajectory stitching for improved behavioural cloning and its applications

Behavioural cloning (BC) is a commonly used imitation learning method to...
research
06/10/2023

HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach

Offline reinforcement learning (ORL) has gained attention as a means of ...
research
02/13/2021

PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

We consider offline reinforcement learning (RL) with heterogeneous agent...
research
11/30/2022

Efficient Reinforcement Learning Through Trajectory Generation

A key barrier to using reinforcement learning (RL) in many real-world ap...
research
08/07/2023

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

Offline reinforcement learning (RL) methods strike a balance between exp...
research
12/31/2011

T-Learning

Traditional Reinforcement Learning (RL) has focused on problems involvin...
research
02/23/2021

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning

Thermal power generation plays a dominant role in the world's electricit...

Please sign up or login with your details

Forgot password? Click here to reset