
On Queryefficient Planning in MDPs under Linear Realizability of the Optimal Statevalue Function
We consider the problem of local planning in fixedhorizon Markov Decisi...
read it

Sample Complexity of Episodic FixedHorizon Reinforcement Learning
Recently, there has been significant progress in understanding reinforce...
read it

A GradientAware Search Algorithm for Constrained Markov Decision Processes
The canonical solution methodology for finite constrained Markov decisio...
read it

ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
read it

ARES: Adaptive RecedingHorizon Synthesis of Optimal Plans
We introduce ARES, an efficient approximation algorithm for generating o...
read it

Exponential Lower Bounds for Planning in MDPs With LinearlyRealizable Optimal ActionValue Functions
We consider the problem of local planning in fixedhorizon Markov Decisi...
read it

Tightening the Dependence on Horizon in the Sample Complexity of QLearning
Qlearning, which seeks to learn the optimal Qfunction of a Markov deci...
read it
A SampleEfficient Algorithm for Episodic FiniteHorizon MDP with Constraints
Constrained Markov Decision Processes (CMDPs) formalize sequential decisionmaking problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixedhorizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finitehorizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an ϵoptimal policy, i.e., with resulting objective value within ϵ of the optimal value and satisfying the constraints within ϵtolerance, with probability at least 1δ. The number of episodes needed is shown to be of the order 𝒪̃(SAC^2H^2/ϵ^2log1/δ), where C is the upper bound on the number of possible successor states for a stateaction pair. Therefore, if C ≪ S, the number of episodes needed have a linear dependence on the state and action space sizes S and A, respectively, and quadratic dependence on the time horizon H.
READ FULL TEXT
Comments
There are no comments yet.