On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

06/18/2023
by   Mudit Gaur, et al.
0

Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics. To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L). Our approach involves estimating the Q-function in each iteration through a convex optimization problem. We establish that our proposed approach attains a sample complexity of 𝒪̃(1/ϵ^4(1-γ)^4). In contrast, the existing sample complexity results in the literature only hold for a tabular or linear MDP. Our result, on the other hand, holds for countable state spaces and does not require a linear or low-rank structure on the MDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2022

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Deep Q-learning based algorithms have been applied successfully in many ...
research
06/02/2022

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

Natural actor-critic (NAC) and its variants, equipped with the represent...
research
01/31/2022

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

We propose a single time-scale actor-critic algorithm to solve the linea...
research
08/18/2022

Global Convergence of Two-timescale Actor-Critic for Solving Linear Quadratic Regulator

The actor-critic (AC) reinforcement learning algorithms have been the po...
research
12/31/2020

Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup

Asynchronous and parallel implementation of standard reinforcement learn...
research
02/27/2018

Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

We consider the actor-critic contextual bandit for the mobile health (mH...
research
11/03/2020

Intrinsic Robotic Introspection: Learning Internal States From Neuron Activations

We present an introspective framework inspired by the process of how hum...

Please sign up or login with your details

Forgot password? Click here to reset