Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

06/20/2023
by   Hang Wang, et al.
0

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved quickly in some cases but become stagnant in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on “whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?”. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton's method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.

READ FULL TEXT

page 14

page 28

research
02/26/2018

Addressing Function Approximation Error in Actor-Critic Methods

In value-based reinforcement learning methods such as deep Q-learning, f...
research
04/17/2017

Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

Online reinforcement learning (RL) is increasingly popular for the perso...
research
10/18/2022

Finite-time analysis of single-timescale actor-critic

Despite the great empirical success of actor-critic methods, its finite-...
research
10/14/2021

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize beh...
research
08/19/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning p...
research
04/20/2023

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Effective offline RL methods require properly handling out-of-distributi...
research
07/05/2023

LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

Currently, research on Reinforcement learning (RL) can be broadly classi...

Please sign up or login with your details

Forgot password? Click here to reset