OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

06/21/2021
by   Jongmin Lee, et al.
10

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.

READ FULL TEXT
research
06/14/2022

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Offline reinforcement learning (RL) extends the paradigm of classical RL...
research
04/19/2022

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

We consider the offline constrained reinforcement learning (RL) problem,...
research
05/24/2023

Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure

We consider offline Reinforcement Learning (RL), where the agent does no...
research
03/08/2021

Instabilities of Offline RL with Pre-Trained Neural Representation

In offline reinforcement learning (RL), we seek to utilize offline data ...
research
05/27/2022

Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters

Motivated by the success of ensembles for uncertainty estimation in supe...
research
06/24/2023

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

Offline optimization paradigms such as offline Reinforcement Learning (R...
research
10/24/2021

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn the optimal policy fro...

Please sign up or login with your details

Forgot password? Click here to reset