Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

01/30/2023
by   Hanlin Zhu, et al.
0

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage. Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm, where the critic returns evaluations of the actor (policy) that are pessimistic relative to the offline data and have a small average (importance-weighted) Bellman error. Compared to existing methods, our algorithm simultaneously offers a number of advantages: (1) It is practical and achieves the optimal statistical rate of 1/√(N) – where N is the size of the offline dataset – in converging to the best policy covered in the offline dataset, even when combined with general function approximations. (2) It relies on a weaker average notion of policy coverage (compared to the ℓ_∞ single-policy concentrability) that exploits the structure of policy visitations. (3) It outperforms the data-collection behavior policy over a wide-range of hyperparameters and is the first algorithm to do so without solving a minimax optimization problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning

We propose Adversarially Trained Actor Critic (ATAC), a new model-free a...
research
10/30/2018

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

Off-policy learning is more unstable compared to on-policy learning in r...
research
08/19/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning p...
research
04/20/2023

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Effective offline RL methods require properly handling out-of-distributi...
research
02/28/2023

The In-Sample Softmax for Offline Reinforcement Learning

Reinforcement learning (RL) agents can leverage batches of previously co...
research
10/13/2021

Adapting to Dynamic LEO-B5G Systems: Meta-Critic Learning Based Efficient Resource Scheduling

Low earth orbit (LEO) satellite-assisted communications have been consid...
research
03/14/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Many modern approaches to offline Reinforcement Learning (RL) utilize be...

Please sign up or login with your details

Forgot password? Click here to reset