A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

01/02/2021
by   Minbo Gao, et al.
0

Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space. We present the first algorithm for linear MDP with a low switching cost. Our algorithm achieves an O(√(d^3H^4K)) regret bound with a near-optimal O(d Hlog K) global switching cost where d is the feature dimension, H is the planning horizon and K is the number of episodes the agent plays. Our regret bound matches the best existing polynomial algorithm by [Jin et al 2020] and our switching cost is exponentially smaller than theirs. When specialized to tabular MDP, our switching cost bound improves those in [Bai et al 2019, Zhang et al 20020]. We complement our positive result with an Ω(dH/log d) global switching cost lower bound for any no-regret algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/12/2019

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...
02/13/2022

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

We study the problem of reinforcement learning (RL) with low (policy) sw...
05/30/2019

Provably Efficient Q-Learning with Low Switching Cost

We take initial steps in studying PAC-MDP algorithms with limited adapti...
09/12/2021

Improved Algorithms for Misspecified Linear Markov Decision Processes

For the misspecified linear Markov decision process (MLMDP) model of Jin...
01/06/2019

Large-Scale Markov Decision Problems via the Linear Programming Dual

We consider the problem of controlling a fully specified Markov decision...
09/28/2019

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

In this paper we derive an efficient method for computing the indices as...
02/13/2021

Online Apprenticeship Learning

In Apprenticeship Learning (AL), we are given a Markov Decision Process ...