Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

06/01/2022
by   Andrea Zanette, et al.
0

The Q-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed even with linear function approximation. In practice, tools such as target networks and experience replay appear to be essential, but the individual contribution of each of these mechanisms is not well understood theoretically. This work proposes an exploration variant of the basic Q-learning protocol with linear function approximation. Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of step elapsed. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error. The algorithm also exhibits a form of instance-dependence, in that its performance depends on the "effective" feature dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs

Q-learning is a popular Reinforcement Learning (RL) algorithm which is w...
research
03/05/2022

Target Network and Truncation Overcome The Deadly triad in Q-Learning

Q-learning with function approximation is one of the most empirically su...
research
02/25/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undis...
research
07/11/2019

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical prob...
research
06/14/2019

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Q-learning with function approximation is one of the most popular method...
research
09/26/2022

Paused Agent Replay Refresh

Reinforcement learning algorithms have become more complex since the inv...
research
06/24/2019

Optimal Use of Experience in First Person Shooter Environments

Although reinforcement learning has made great strides recently, a conti...

Please sign up or login with your details

Forgot password? Click here to reset