Improved Algorithms for Misspecified Linear Markov Decision Processes

09/12/2021
by   Daniel Vial, et al.
0

For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after K episodes scales as K max{ε_mis, ε_tol}, where ε_mis is the degree of misspecification and ε_tol is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as K →∞. (P3) It does not require ε_mis as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of ε_tol, we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB algorithm, which Takemura et al. [2021] recently showed satisfies (P3) in the contextual bandit setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2021

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Many real-world applications, such as those in medical domains, recommen...
research
12/15/2020

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation ...
research
09/05/2019

√(n)-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

In this paper, we consider the problem of online learning of Markov deci...
research
08/21/2020

Refined Analysis of FPL for Adversarial Markov Decision Processes

We consider the adversarial Markov Decision Process (MDP) problem, where...
research
05/26/2022

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

We consider regret minimization for Adversarial Markov Decision Processe...
research
11/04/2019

Controlling a random population

Bertrand et al. introduced a model of parameterised systems, where each ...
research
08/03/2009

Regret Bounds for Opportunistic Channel Access

We consider the task of opportunistic channel access in a primary system...

Please sign up or login with your details

Forgot password? Click here to reset