A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

06/24/2021
by   Andrea Tirinzoni, et al.
0

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs). While, similar to prior work (e.g., for ergodic MDPs), the lower-bound is the solution to an optimization problem, our derivation reveals the need for an additional constraint on the visitation distribution over state-action pairs that explicitly accounts for the dynamics of the MDP. We provide a characterization of our lower-bound through a series of examples illustrating how different MDPs may have significantly different complexity. 1) We first consider a "difficult" MDP instance, where the novel constraint based on the dynamics leads to a larger lower-bound (i.e., a larger regret) compared to the classical analysis. 2) We then show that our lower-bound recovers results previously derived for specific MDP instances. 3) Finally, we show that, in certain "simple" MDPs, the lower bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all. We show that this last result is attainable (up to poly(H) terms, where H is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

Horizon-Free Reinforcement Learning for Latent Markov Decision Processes

We study regret minimization for reinforcement learning (RL) in Latent M...
research
02/09/2021

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

This paper presents a new model-free algorithm for episodic finite-horiz...
research
06/27/2021

Regret Analysis in Deterministic Reinforcement Learning

We consider Markov Decision Processes (MDPs) with deterministic transiti...
research
11/28/2019

Analysis of Lower Bounds for Simple Policy Iteration

Policy iteration is a family of algorithms that are used to find an opti...
research
02/09/2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound

In this work, we consider the regret minimization problem for reinforcem...
research
05/23/2018

Representation Balancing MDPs for Off-Policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In co...
research
06/03/2018

Exploration in Structured Reinforcement Learning

We address reinforcement learning problems with finite state and action ...

Please sign up or login with your details

Forgot password? Click here to reset