Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

12/27/2021
by   Yufeng Zhang, et al.
17

Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years. However, most of the existing theoretical support for AC algorithms focuses on the case of linear function approximations, or linearized neural networks, where the feature representation is fixed throughout training. Such a limitation fails to capture the key aspect of representation learning in neural AC, which is pivotal in practical problems. In this work, we take a mean-field perspective on the evolution and convergence of feature-based neural AC. Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates. The critic is updated by temporal-difference (TD) learning with a larger stepsize while the actor is updated via proximal policy optimization (PPO) with a smaller stepsize. In the continuous-time and infinite-width limiting regime, when the timescales are properly separated, we prove that neural AC finds the globally optimal policy at a sublinear rate. Additionally, we prove that the feature representation induced by the critic network is allowed to evolve within a neighborhood of the initial one.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2020

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

We study the global convergence and global optimality of actor-critic, o...
research
06/08/2020

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Temporal-difference and Q-learning play a key role in deep reinforcement...
research
03/13/2023

Actor-Critic learning for mean-field control in continuous time

We study policy gradient for mean-field control in continuous time in a ...
research
11/20/2018

Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder

Representation learning of pedestrian trajectories transforms variable-l...
research
09/08/2023

Actor critic learning algorithms for mean-field control with moment neural networks

We develop a new policy gradient and actor-critic algorithm for solving ...
research
07/10/2020

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm for a class of ...
research
06/02/2022

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

Natural actor-critic (NAC) and its variants, equipped with the represent...

Please sign up or login with your details

Forgot password? Click here to reset