A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

09/23/2020
by   Krishna C. Kalagarla, et al.
3

Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finite-horizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an ϵ-optimal policy, i.e., with resulting objective value within ϵ of the optimal value and satisfying the constraints within ϵ-tolerance, with probability at least 1-δ. The number of episodes needed is shown to be of the order 𝒪̃(|S||A|C^2H^2/ϵ^2log1/δ), where C is the upper bound on the number of possible successor states for a state-action pair. Therefore, if C ≪ |S|, the number of episodes needed have a linear dependence on the state and action space sizes |S| and |A|, respectively, and quadratic dependence on the time horizon H.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Recently there is a surge of interest in understanding the horizon-depen...
research
10/10/2022

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

The infinite horizon setting is widely adopted for problems of reinforce...
research
02/03/2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

We consider the problem of local planning in fixed-horizon Markov Decisi...
research
10/29/2015

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Recently, there has been significant progress in understanding reinforce...
research
12/21/2016

ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans

We introduce ARES, an efficient approximation algorithm for generating o...
research
03/04/2020

Exploration-Exploitation in Constrained MDPs

In many sequential decision-making problems, the goal is to optimize a u...
research
10/01/2022

Primal-dual regression approach for Markov decision processes with general state and action space

We develop a regression based primal-dual martingale approach for solvin...

Please sign up or login with your details

Forgot password? Click here to reset