Minimax Optimal Reinforcement Learning for Discounted MDPs

10/01/2020
by   Jiafan He, et al.
17

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) in the tabular setting. We propose a model-based algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus. It achieves Õ(√(SAT)/(1-γ)^1.5) regret, where S is the number of states, A is the number of actions, γ is the discount factor and T is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least Ω̃(√(SAT)/(1-γ)^1.5). Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-γ is near optimal for discounted MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2020

Regret Bounds for Discounted MDPs

Recently, it has been shown that carefully designed reinforcement learni...
research
10/20/2022

Horizon-Free Reinforcement Learning for Latent Markov Decision Processes

We study regret minimization for reinforcement learning (RL) in Latent M...
research
06/24/2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

We study minimax optimal reinforcement learning in episodic factored Mar...
research
11/03/2019

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

In order to make good decision under uncertainty an agent must learn fro...
research
08/06/2018

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

We give a simple optimistic algorithm for which it is easy to derive reg...
research
03/25/2017

Exploration--Exploitation in MDPs with Options

While a large body of empirical results show that temporally-extended ac...
research
06/21/2020

On Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throug...

Please sign up or login with your details

Forgot password? Click here to reset