Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning

01/20/2022
by   Jaeyoung Lee, et al.
0

We consider the challenge of finding a deterministic policy for a Markov decision process that uniformly (in all states) maximizes one reward subject to a probabilistic constraint over a different reward. Existing solutions do not fully address our precise problem definition, which nevertheless arises naturally in the context of safety-critical robotic systems. This class of problem is known to be hard, but the combined requirements of determinism and uniform optimality can create learning instability. In this work, after describing and motivating our problem with a simple example, we present a suitable constrained reinforcement learning algorithm that prevents learning instability, using recursive constraints. Our proposed approach admits an approximative form that improves efficiency and is conservative w.r.t. the constraint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2023

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

We study optimality for the safety-constrained Markov decision process w...
research
06/12/2022

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

We consider the problem of constrained Markov decision process (CMDP) in...
research
06/04/2021

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

We address the issue of safety in reinforcement learning. We pose the pr...
research
09/13/2021

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Reinforcement learning is widely used in applications where one needs to...
research
05/23/2023

Constrained Reinforcement Learning for Dynamic Material Handling

As one of the core parts of flexible manufacturing systems, material han...
research
04/19/2023

Evolving Constrained Reinforcement Learning Policy

Evolutionary algorithms have been used to evolve a population of actors ...
research
02/07/2020

Safe Wasserstein Constrained Deep Q-Learning

This paper presents a distributionally robust Q-Learning algorithm (DrQ)...

Please sign up or login with your details

Forgot password? Click here to reset