A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

by   Minbo Gao, et al.

Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space. We present the first algorithm for linear MDP with a low switching cost. Our algorithm achieves an O(√(d^3H^4K)) regret bound with a near-optimal O(d Hlog K) global switching cost where d is the feature dimension, H is the planning horizon and K is the number of episodes the agent plays. Our regret bound matches the best existing polynomial algorithm by [Jin et al 2020] and our switching cost is exponentially smaller than theirs. When specialized to tabular MDP, our switching cost bound improves those in [Bai et al 2019, Zhang et al 20020]. We complement our positive result with an Ω(dH/log d) global switching cost lower bound for any no-regret algorithm.


page 1

page 2

page 3

page 4


Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

We study the problem of reinforcement learning (RL) with low (policy) sw...

Provably Efficient Q-Learning with Low Switching Cost

We take initial steps in studying PAC-MDP algorithms with limited adapti...

Improved Algorithms for Misspecified Linear Markov Decision Processes

For the misspecified linear Markov decision process (MLMDP) model of Jin...

Large-Scale Markov Decision Problems via the Linear Programming Dual

We consider the problem of controlling a fully specified Markov decision...

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

In this paper we derive an efficient method for computing the indices as...

Online Apprenticeship Learning

In Apprenticeship Learning (AL), we are given a Markov Decision Process ...