Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

10/22/2020
by   Andrea Agazzi, et al.
0

We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2022

Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime

We study the global convergence of policy gradient for infinite-horizon,...
research
03/12/2023

Global Optimality of Elman-type RNN in the Mean-Field Regime

We analyze Elman-type Recurrent Reural Networks (RNNs) and their trainin...
research
06/06/2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

We propose a new objective function for finite-horizon episodic Markov d...
research
02/22/2021

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

The softmax policy gradient (PG) method, which performs gradient ascent ...
research
02/05/2020

A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics

We consider the problem of universal approximation of functions by two-l...
research
06/08/2020

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Temporal-difference and Q-learning play a key role in deep reinforcement...
research
06/19/2020

An analytic theory of shallow networks dynamics for hinge loss classification

Neural networks have been shown to perform incredibly well in classifica...

Please sign up or login with your details

Forgot password? Click here to reset