Is Q-learning Provably Efficient?

07/10/2018
by   Chi Jin, et al.
0

Model-free reinforcement learning (RL) algorithms, such as Q-learning, directly parameterize and update value functions or policies without explicitly modeling the environment. They are typically simpler, more flexible to use, and thus more prevalent in modern deep RL than model-based approaches. However, empirical work has suggested that model-free algorithms may require more samples to learn [Deisenroth and Rasmussen 2011, Schulman et al. 2015]. The theoretical question of "whether model-free algorithms can be made sample efficient" is one of the most fundamental questions in RL, and remains unsolved even in the basic scenario with finitely many states and actions. We prove that, in an episodic MDP setting, Q-learning with UCB exploration achieves regret Õ(√(H^3 SAT)), where S and A are the numbers of states and actions, H is the number of steps per episode, and T is the total number of steps. This sample efficiency matches the optimal regret that can be achieved by any model-based approach, up to a single √(H) factor. To the best of our knowledge, this is the first analysis in the model-free setting that establishes √(T) regret without requiring access to a "simulator."

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2019

Stochastic Lipschitz Q-Learning

In an episodic Markov Decision Process (MDP) problem, an online algorith...
research
09/22/2020

Is Q-Learning Provably Efficient? An Extended Analysis

This work extends the analysis of the theoretical results presented with...
research
03/07/2021

The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Some reinforcement learning methods suffer from high sample complexity c...
research
04/17/2018

Regret Bounds for Model-Free Linear Quadratic Control

Model-free approaches for reinforcement learning (RL) and continuous con...
research
07/14/2020

Single-partition adaptive Q-learning

This paper introduces single-partition adaptive Q-learning (SPAQL), an a...
research
05/15/2019

Autonomous Penetration Testing using Reinforcement Learning

Penetration testing (pentesting) involves performing a controlled attack...
research
07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...

Please sign up or login with your details

Forgot password? Click here to reset