On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

02/25/2023
by   Rahul Misra, et al.
0

We study optimality for the safety-constrained Markov decision process which is the underlying framework for safe reinforcement learning. Specifically, we consider a constrained Markov decision process (with finite states and finite actions) where the goal of the decision maker is to reach a target set while avoiding an unsafe set(s) with certain probabilistic guarantees. Therefore the underlying Markov chain for any control policy will be multichain since by definition there exists a target set and an unsafe set. The decision maker also has to be optimal (with respect to a cost function) while navigating to the target set. This gives rise to a multi-objective optimization problem. We highlight the fact that Bellman's principle of optimality may not hold for constrained Markov decision problems with an underlying multichain structure (as shown by the counterexample). We resolve the counterexample by formulating the aforementioned multi-objective optimization problem as a zero-sum game and thereafter construct an asynchronous value iteration scheme for the Lagrangian (similar to Shapley's algorithm. Finally, we consider the reinforcement learning problem for the same and construct a modified Q-learning algorithm for learning the Lagrangian from data. We also provide a lower bound on the number of iterations required for learning the Lagrangian and corresponding error bounds.

READ FULL TEXT
research
11/20/2019

Safe Policies for Reinforcement Learning via Primal-Dual Methods

In this paper, we study the learning of safe policies in the setting of ...
research
01/20/2022

Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning

We consider the challenge of finding a deterministic policy for a Markov...
research
11/23/2021

Adaptive Multi-Goal Exploration

We introduce a generic strategy for provably efficient multi-goal explor...
research
02/07/2020

Safe Wasserstein Constrained Deep Q-Learning

This paper presents a distributionally robust Q-Learning algorithm (DrQ)...
research
03/03/2019

Scaling up budgeted reinforcement learning

Can we learn a control policy able to adapt its behaviour in real time s...
research
03/21/2023

Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning

In high-dimensional time-series analysis, it is essential to have a set ...
research
06/05/2021

Dynamic Resource Configuration for Low-Power IoT Networks: A Multi-Objective Reinforcement Learning Method

Considering grant-free transmissions in low-power IoT networks with unkn...

Please sign up or login with your details

Forgot password? Click here to reset