Improved Algorithms for Misspecified Linear Markov Decision Processes

09/12/2021
by   Daniel Vial, et al.
0

For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after K episodes scales as K max{ε_mis, ε_tol}, where ε_mis is the degree of misspecification and ε_tol is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as K →∞. (P3) It does not require ε_mis as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of ε_tol, we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB algorithm, which Takemura et al. [2021] recently showed satisfies (P3) in the contextual bandit setting.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

01/02/2021

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Many real-world applications, such as those in medical domains, recommen...
12/15/2020

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation ...
09/05/2019

√(n)-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

In this paper, we consider the problem of online learning of Markov deci...
08/21/2020

Refined Analysis of FPL for Adversarial Markov Decision Processes

We consider the adversarial Markov Decision Process (MDP) problem, where...
11/04/2019

Controlling a random population

Bertrand et al. introduced a model of parameterised systems, where each ...
03/14/2019

Contextual Markov Decision Processes using Generalized Linear Models

We consider the recently proposed reinforcement learning (RL) framework ...
08/03/2009

Regret Bounds for Opportunistic Channel Access

We consider the task of opportunistic channel access in a primary system...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.