A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

01/02/2021
by   Minbo Gao, et al.
0

Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space. We present the first algorithm for linear MDP with a low switching cost. Our algorithm achieves an O(√(d^3H^4K)) regret bound with a near-optimal O(d Hlog K) global switching cost where d is the feature dimension, H is the planning horizon and K is the number of episodes the agent plays. Our regret bound matches the best existing polynomial algorithm by [Jin et al 2020] and our switching cost is exponentially smaller than theirs. When specialized to tabular MDP, our switching cost bound improves those in [Bai et al 2019, Zhang et al 20020]. We complement our positive result with an Ω(dH/log d) global switching cost lower bound for any no-regret algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...
research
05/30/2019

Provably Efficient Q-Learning with Low Switching Cost

We take initial steps in studying PAC-MDP algorithms with limited adapti...
research
09/12/2021

Improved Algorithms for Misspecified Linear Markov Decision Processes

For the misspecified linear Markov decision process (MLMDP) model of Jin...
research
06/26/2023

A General Framework for Sequential Decision-Making under Adaptivity Constraints

We take the first step in studying general sequential decision-making un...
research
01/06/2019

Large-Scale Markov Decision Problems via the Linear Programming Dual

We consider the problem of controlling a fully specified Markov decision...
research
02/13/2021

Online Apprenticeship Learning

In Apprenticeship Learning (AL), we are given a Markov Decision Process ...
research
09/28/2019

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

In this paper we derive an efficient method for computing the indices as...

Please sign up or login with your details

Forgot password? Click here to reset