Near-Optimal Sample Complexity Bounds for Constrained MDPs

06/13/2022
by   Sharan Vaswani, et al.
4

In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an ϵ-optimal policy with probability 1 - δ, by making Õ(S A log(1/δ)/(1 - γ)^3 ϵ^2) queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by Õ(S A log(1/δ)/(1 - γ)^5 ϵ^2 ζ^2) where ζ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2022

Best Policy Identification in Linear MDPs

We investigate the problem of best policy identification in discounted l...
research
06/13/2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

We prove new upper and lower bounds for sample complexity of finding an ...
research
03/17/2022

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

In probably approximately correct (PAC) reinforcement learning (RL), an ...
research
12/09/2021

Reinforcement Learning with Almost Sure Constraints

In this work we address the problem of finding feasible policies for Con...
research
12/04/2020

Near-Optimal Model Discrimination with Non-Disclosure

Let θ_0,θ_1 ∈ℝ^d be the population risk minimizers associated to some lo...
research
09/18/2023

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

The key assumption underlying linear Markov Decision Processes (MDPs) is...
research
08/28/2019

Information-Theoretic Lower Bounds for Compressive Sensing with Generative Models

The goal of standard compressive sensing is to estimate an unknown vecto...

Please sign up or login with your details

Forgot password? Click here to reset