A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence

06/10/2023
by   Kexuan Wang, et al.
0

Abstract – Deep Actor-Critic algorithms, which combine Actor-Critic with deep neural network (DNN), have been among the most prevalent reinforcement learning algorithms for decision-making problems in simulated environments. However, the existing deep Actor-Critic algorithms are still not mature to solve realistic problems with non-convex stochastic constraints and high cost to interact with the environment. In this paper, we propose a single-loop deep Actor-Critic (SLDAC) algorithmic framework for general constrained reinforcement learning (CRL) problems. In the actor step, the constrained stochastic successive convex approximation (CSSCA) method is applied to handle the non-convex stochastic objective and constraints. In the critic step, the critic DNNs are only updated once or a few finite times for each iteration, which simplifies the algorithm to a single-loop framework (the existing works require a sufficient number of updates for the critic step to ensure a good enough convergence of the inner loop for each iteration). Moreover, the variance of the policy gradient estimation is reduced by reusing observations from the old policy. The single-loop design and the observation reuse effectively reduce the agent-environment interaction cost and computational complexity. In spite of the biased policy gradient estimation incurred by the single-loop design and observation reuse, we prove that the SLDAC with a feasible initial point can converge to a Karush-Kuhn-Tuker (KKT) point of the original problem almost surely. Simulations show that the SLDAC algorithm can achieve superior performance with much lower interaction cost.

READ FULL TEXT
research
08/02/2020

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

We study the global convergence and global optimality of actor-critic, o...
research
09/25/2021

Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

The hierarchical interaction between the actor and critic in actor-criti...
research
07/10/2020

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm for a class of ...
research
05/26/2021

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

We propose a successive convex approximation based off-policy optimizati...
research
12/25/2022

Novel Reinforcement Learning Algorithm for Suppressing Synchronization in Closed Loop Deep Brain Stimulators

Parkinson's disease is marked by altered and increased firing characteri...
research
10/26/2020

Lyapunov-Based Reinforcement Learning State Estimator

In this paper, we consider the state estimation problem for nonlinear st...
research
10/28/2021

Bayesian Sequential Optimal Experimental Design for Nonlinear Models Using Policy Gradient Reinforcement Learning

We present a mathematical framework and computational methods to optimal...

Please sign up or login with your details

Forgot password? Click here to reset