Horizon-Free Reinforcement Learning for Latent Markov Decision Processes

10/20/2022
by   Runlong Zhou, et al.
0

We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver. We prove an O(√(M Γ S A K)) regret bound where M is the number of contexts, S is the number of states, A is the number of actions, K is the number of episodes, and Γ≤ S is the maximum transition degree of any state-action pair. The regret bound only scales logarithmically with the planning horizon, thus yielding the first (nearly) horizon-free regret bound for LMDP. Key in our proof is an analysis of the total variance of alpha vectors, which is carefully bounded by a recursion-based technique. We complement our positive result with a novel Ω(√(M S A K)) regret lower bound with Γ = 2, which shows our upper bound minimax optimal when Γ is a constant. Our lower bound relies on new constructions of hard instances and an argument based on the symmetrization technique from theoretical computer science, both of which are technically different from existing lower bound proof for MDPs, and thus can be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2020

Minimax Optimal Reinforcement Learning for Discounted MDPs

We study the reinforcement learning problem for discounted Markov Decisi...
research
06/24/2021

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

We derive a novel asymptotic problem-dependent lower-bound for regret mi...
research
06/01/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

This paper studies model-based reinforcement learning (RL) for regret mi...
research
05/10/2023

An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes

A large variety of real-world Reinforcement Learning (RL) tasks is chara...
research
10/25/2022

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

We study the regret guarantee for risk-sensitive reinforcement learning ...
research
02/09/2021

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

This paper presents a new model-free algorithm for episodic finite-horiz...
research
02/26/2018

Variance Reduction Methods for Sublinear Reinforcement Learning

This work considers the problem of provably optimal reinforcement learni...

Please sign up or login with your details

Forgot password? Click here to reset