Offline Reinforcement Learning with On-Policy Q-Function Regularization

07/25/2023
by   Laixi Shi, et al.
0

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
research
06/14/2022

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Offline reinforcement learning (RL) extends the paradigm of classical RL...
research
06/09/2022

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
06/09/2023

In-Sample Policy Iteration for Offline Reinforcement Learning

Offline reinforcement learning (RL) seeks to derive an effective control...
research
05/21/2022

User-Interactive Offline Reinforcement Learning

Offline reinforcement learning algorithms still lack trust in practice d...
research
03/13/2020

Taylor Expansion Policy Optimization

In this work, we investigate the application of Taylor expansions in rei...
research
10/17/2022

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional ...

Please sign up or login with your details

Forgot password? Click here to reset