Improving Offline RL by Blending Heuristics

06/01/2023
by   Sinong Geng, et al.
0

We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping. HUBL modifies Bellman operators used in these algorithms, partially replacing the bootstrapped values with Monte-Carlo returns as heuristics. For trajectories with higher returns, HUBL relies more on heuristics and less on bootstrapping; otherwise, it leans more heavily on bootstrapping. We show that this idea can be easily implemented by relabeling the offline datasets with adjusted rewards and discount factors, making HUBL readily usable by many existing offline RL implementations. We theoretically prove that HUBL reduces offline RL's complexity and thus improves its finite-sample performance. Furthermore, we empirically demonstrate that HUBL consistently improves the policy quality of four state-of-the-art bootstrapping-based offline RL algorithms (ATAC, CQL, TD3+BC, and IQL), by 9 on average over 27 datasets of the D4RL and Meta-World benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2021

Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning

We hypothesize that empirically studying the sample complexity of offlin...
research
06/22/2023

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Most offline reinforcement learning (RL) algorithms return a target poli...
research
06/07/2023

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Offline reinforcement learning (RL) offers an appealing approach to real...
research
06/07/2022

On the Role of Discount Factor in Offline Reinforcement Learning

Offline reinforcement learning (RL) enables effective learning from prev...
research
06/16/2021

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Offline Reinforcement Learning (RL) aims to extract near-optimal policie...
research
01/31/2023

Revisiting Bellman Errors for Offline Model Selection

Offline model selection (OMS), that is, choosing the best policy from a ...
research
05/24/2023

Improving Language Models with Advantage-based Offline Policy Gradients

Improving language model generations according to some user-defined qual...

Please sign up or login with your details

Forgot password? Click here to reset