Minimax Regret Bounds for Reinforcement Learning

03/16/2017
by   Mohammad Gheshlaghi Azar, et al.
0

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of Õ( √(HSAT) + H^2S^2A+H√(T)) where H is the time horizon, S the number of states, A the number of actions and T the number of time-steps. This result improves over the best previous known bound Õ(HS √(AT)) achieved by the UCRL2 algorithm of Jaksch et al., 2010. The key significance of our new results is that when T≥ H^3S^3A and SA≥ H, it leads to a regret of Õ(√(HSAT)) that matches the established lower bound of Ω(√(HSAT)) up to a logarithmic factor. Our analysis contains two key insights. We use careful application of concentration inequalities to the optimal value function as a whole, rather than to the transitions probabilities (to improve scaling in S), and we define Bernstein-based "exploration bonuses" that use the empirical variance of the estimated values at the next states (to improve scaling in H).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...
research
02/26/2018

Variance Reduction Methods for Sublinear Reinforcement Learning

This work considers the problem of provably optimal reinforcement learni...
research
05/16/2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabul...
research
07/25/2018

Variational Bayesian Reinforcement Learning with Regret Bounds

We consider the exploration-exploitation trade-off in reinforcement lear...
research
02/07/2023

Layered State Discovery for Incremental Autonomous Exploration

We study the autonomous exploration (AX) problem proposed by Lim Aue...
research
09/22/2020

Is Q-Learning Provably Efficient? An Extended Analysis

This work extends the analysis of the theoretical results presented with...
research
06/01/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

This paper studies model-based reinforcement learning (RL) for regret mi...

Please sign up or login with your details

Forgot password? Click here to reset