Distance-Sensitive Offline Reinforcement Learning

05/23/2022
by   Jianxiong Li, et al.
0

In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas. Unfortunately, existing offline RL methods are often over-conservative, inevitably hurting generalization performance outside data distribution. In our study, one interesting observation is that deep Q functions approximate well inside the convex hull of training data. Inspired by this, we propose a new method, DOGE (Distance-sensitive Offline RL with better GEneralization). DOGE marries dataset geometry with deep function approximators in offline RL, and enables exploitation in generalizable OOD areas rather than strictly constraining policy within data distribution. Specifically, DOGE trains a state-conditioned distance function that can be readily plugged into standard actor-critic methods as a policy constraint. Simple yet elegant, our algorithm enjoys better generalization compared to state-of-the-art methods on D4RL benchmarks. Theoretical analysis demonstrates the superiority of our approach to existing methods that are solely based on data distribution or support constraints.

READ FULL TEXT

page 2

page 24

page 27

page 29

research
10/04/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Offline reinforcement learning (offline RL), which aims to find an optim...
research
11/23/2021

Understanding the Impact of Data Distribution on Q-learning with Function Approximation

In this work, we focus our attention on the study of the interplay betwe...
research
06/07/2022

Generalized Data Distribution Iteration

To obtain higher sample efficiency and superior final performance simult...
research
10/15/2022

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Offline reinforcement learning (RL) methods can generally be categorized...
research
07/05/2023

LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

Currently, research on Reinforcement learning (RL) can be broadly classi...
research
11/02/2022

Dual Generator Offline Reinforcement Learning

In offline RL, constraining the learned policy to remain close to the da...
research
06/03/2019

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Off-policy reinforcement learning aims to leverage experience collected ...

Please sign up or login with your details

Forgot password? Click here to reset