Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP

07/22/2022
by   Orin Levy, et al.
0

We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known, where the dynamics is unknown but independent of the context and the most challenging setting where the dynamics is unknown and context-dependent. For the latter, our algorithm obtains Õ( max{H,1/p_min}H|S|^3/2√(|A|Tlog(max{|ℱ|,|𝒫|}/δ))) regret bound, with probability 1-δ, where 𝒫 and ℱ are finite and realizable function classes used to approximate the dynamics and rewards respectively, p_min is the minimum reachability parameter, S is the set of states, A the set of actions, H the horizon, and T the number of episodes. To our knowledge, our approach is the first optimistic approach applied to contextual MDPs with general function approximation (i.e., without additional knowledge regarding the function class, such as it being linear and etc.). In addition, we present a lower bound of Ω(√(T H |S| |A| ln(|ℱ|/|S|)/ln(|A|))), on the expected regret which holds even in the case of known dynamics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2022

Counterfactual Optimism: Rate Optimal Regret for Stochastic Contextual MDPs

We present the UC^3RL algorithm for regret minimization in Stochastic Co...
research
03/02/2022

Learning Efficiently Function Approximation for Contextual MDP

We study learning contextual MDPs using a function approximation for bot...
research
03/02/2023

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

We present the OMG-CMDP! algorithm for regret minimization in adversaria...
research
10/27/2021

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

We study the role of the representation of state-action value functions ...
research
02/09/2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound

In this work, we consider the regret minimization problem for reinforcem...
research
01/30/2023

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation and a...
research
07/01/2021

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...

Please sign up or login with your details

Forgot password? Click here to reset