On the Expected Dynamics of Nonlinear TD Learning

05/29/2019
by   David Brandfonbrener, et al.
8

While there are convergence guarantees for temporal difference (TD) learning when using linear function approximators, the situation for nonlinear models is far less understood, and divergent examples are known. Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation. More precisely, we consider the expected dynamics of the TD(0) algorithm. We prove that this ODE is attracted to a compact set for smooth homogeneous functions including some ReLU networks. For over-parametrized and well-conditioned functions in sufficiently reversible environments we prove convergence to the global optimum. This result improves when using k-step or λ returns. Finally, we generalize a divergent counterexample to a family of divergent problems to motivate the assumptions needed to prove convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2021

Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time

In this paper we prove that Local (S)GD (or FedAvg) can optimize two-lay...
research
05/27/2019

Temporal-difference learning for nonlinear value function approximation in the lazy training regime

We discuss the approximation of the value function for infinite-horizon ...
research
10/17/2020

Blow-up of semi-discrete solution of a nonlinear parabolic equation with gradient term

This paper is concerned with approximation of blow-up phenomena in nonli...
research
05/02/2018

Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies

In reinforcement learning, temporal difference (TD) is the most direct a...
research
05/20/2018

Nonlinear Distributional Gradient Temporal-Difference Learning

We devise a distributional variant of gradient temporal-difference (TD) ...
research
02/25/2023

Provably Efficient Gauss-Newton Temporal Difference Learning Method with Function Approximation

In this paper, based on the spirit of Fitted Q-Iteration (FQI), we propo...
research
12/11/2018

Efficient learning of smooth probability functions from Bernoulli tests with guarantees

We study the fundamental problem of learning an unknown, smooth probabil...

Please sign up or login with your details

Forgot password? Click here to reset