Adversarially Trained Actor Critic for Offline Reinforcement Learning

02/05/2022
by   Ching-An Cheng, et al.
0

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning under insufficient data coverage, based on a two-player Stackelberg game framing of offline RL: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a...
research
02/18/2023

Efficient exploration via epistemic-risk-seeking policy optimization

Exploration remains a key challenge in deep reinforcement learning (RL)....
research
04/20/2023

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Effective offline RL methods require properly handling out-of-distributi...
research
08/19/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning p...
research
10/31/2021

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best...
research
02/28/2020

Self-Tuning Deep Reinforcement Learning

Reinforcement learning (RL) algorithms often require expensive manual or...
research
10/09/2019

Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

This paper investigates how to efficiently transition and update policie...

Please sign up or login with your details

Forgot password? Click here to reset