Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

08/01/2020
by   Aria HasanzadeZonuzy, et al.
0

Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP), where the constraints are some function of the occupancy measure generated by the policy. We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy—both objective maximization and constraint satisfaction—in a PAC sense. We explore generative model based class of RL algorithms wherein samples are taken initially to estimate a model. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample complexity of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints, which suggests that the approach may be easily utilized in real systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2020

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement lea...
research
06/20/2022

Policy Optimization with Linear Temporal Logic Constraints

We study the problem of policy optimization (PO) with linear temporal lo...
research
06/16/2020

The Teaching Dimension of Q-learning

In this paper, we initiate the study of sample complexity of teaching, t...
research
04/14/2021

Safe Continuous Control with Constrained Model-Based Policy Optimization

The applicability of reinforcement learning (RL) algorithms in real-worl...
research
11/14/2021

Explicit Explore, Exploit, or Escape (E^4): near-optimal safety-constrained reinforcement learning in polynomial time

In reinforcement learning (RL), an agent must explore an initially unkno...
research
03/15/2023

On the Benefits of Leveraging Structural Information in Planning Over the Learned Model

Model-based Reinforcement Learning (RL) integrates learning and planning...
research
02/11/2021

Sufficiently Accurate Model Learning for Planning

Data driven models of dynamical systems help planners and controllers to...

Please sign up or login with your details

Forgot password? Click here to reset