Target Network and Truncation Overcome The Deadly triad in Q-Learning

03/05/2022
by   Zaiwei Chen, et al.
0

Q-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community. Even in the basic linear function approximation setting, there are well-known divergent examples. In this work, we propose a stable design for Q-learning with linear function approximation using target network and truncation, and establish its finite-sample guarantees. Our result implies an 𝒪(ϵ^-2) sample complexity up to a function approximation error. This is the first variant of Q-learning with linear function approximation that is provably stable without requiring strong assumptions or modifying the problem parameters, and achieves the optimal sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2019

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical prob...
research
02/20/2023

Reinforcement Learning with Function Approximation: From Linear to Nonlinear

Function approximation has been an indispensable component in modern rei...
research
03/22/2022

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

Q-learning with function approximation could diverge in the off-policy s...
research
07/14/2021

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

Deep Reinforcement Learning (RL) powered by neural net approximation of ...
research
10/26/2020

The sample complexity of level set approximation

We study the problem of approximating the level set of an unknown functi...
research
03/19/2021

Bilinear Classes: A Structural Framework for Provable Generalization in RL

This work introduces Bilinear Classes, a new structural framework, which...
research
06/01/2022

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

The Q-learning algorithm is a simple and widely-used stochastic approxim...

Please sign up or login with your details

Forgot password? Click here to reset