Randomized Exploration is Near-Optimal for Tabular MDP

02/19/2021
by   Zhihan Xiong, et al.
0

We study exploration using randomized value functions in Thompson Sampling (TS)-like algorithms in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case O(H√(SAT)) regret bound for episodic time-inhomogeneous Markov Decision Process where S is the size of state space, A is the size of action space, H is the planning horizon and T is the number of interactions. This bound polynomially improves all existing bounds for TS-like algorithms based on randomized value functions, and for the first time, matches the Ω(H√(SAT)) lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously only achieved by optimistic algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2019

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

This paper studies a recent proposal to use randomized value functions t...
research
10/23/2020

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

This paper studies regret minimization with randomized value functions i...
research
05/27/2019

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communi...
research
01/27/2019

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

A fundamental question in reinforcement learning is whether model-free a...
research
01/09/2023

Exploration in Model-based Reinforcement Learning with Randomized Reward

Model-based Reinforcement Learning (MBRL) has been widely adapted due to...
research
11/03/2020

Search Problems in Trees with Symmetries: near optimal traversal strategies for individualization-refinement algorithms

We define a search problem on trees that closely captures the backtracki...
research
06/20/2019

Near-optimal Reinforcement Learning using Bayesian Quantiles

We study model-based reinforcement learning in finite communicating Mark...

Please sign up or login with your details

Forgot password? Click here to reset