Provably Efficient Gauss-Newton Temporal Difference Learning Method with Function Approximation

02/25/2023
∙
by   Zhifa Ke, et al.
∙
0
∙

In this paper, based on the spirit of Fitted Q-Iteration (FQI), we propose a Gauss-Newton Temporal Difference (GNTD) method to solve the Q-value estimation problem with function approximation. In each iteration, unlike the original FQI that solves a nonlinear least square subproblem to fit the Q-iteration, the GNTD method can be viewed as an inexact FQI that takes only one Gauss-Newton step to optimize this subproblem, which is much cheaper in computation. Compared to the popular Temporal Difference (TD) learning, which can be viewed as taking a single gradient descent step to FQI's subproblem per iteration, the Gauss-Newton step of GNTD better retains the structure of FQI and hence leads to better convergence. In our work, we derive the finite-sample non-asymptotic convergence of GNTD under linear, neural network, and general smooth function approximations. In particular, recent works on neural TD only guarantee a suboptimal 𝒊(Ïĩ^-4) sample complexity, while GNTD obtains an improved complexity of 𝒊Ėƒ(Ïĩ^-2). Finally, we validate our method via extensive experiments in both online and offline RL problems. Our method exhibits both higher rewards and faster convergence than TD-type methods, including DQN.

READ FULL TEXT
research
∙ 07/06/2023

Global q-superlinear convergence of the infinite-dimensional Newton's method for the regularized p-Stokes equations

The motion of glaciers can be simulated with the p-Stokes equations. We ...
research
∙ 10/27/2020

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a pop...
research
∙ 11/29/2018

Convergence Analysis of a Cooperative Diffusion Gauss-Newton Strategy

In this paper, we investigate the convergence performance of a cooperati...
research
∙ 07/13/2020

Inverse Cubic Iteration

There are thousands of papers on rootfinding for nonlinear scalar equati...
research
∙ 08/15/2023

The L_q-weighted dual programming of the linear Chebyshev approximation and an interior-point method

Given samples of a real or complex-valued function on a set of distinct ...
research
∙ 08/02/2021

Computing the Newton-step faster than Hessian accumulation

Computing the Newton-step of a generic function with N decision variable...
research
∙ 05/29/2019

On the Expected Dynamics of Nonlinear TD Learning

While there are convergence guarantees for temporal difference (TD) lear...

Please sign up or login with your details

Forgot password? Click here to reset