Feasible Policy Iteration

04/18/2023
by   Yujie Yang, et al.
0

Safe reinforcement learning (RL) aims to solve an optimal control problem under safety constraints. Existing direct safe RL methods use the original constraint throughout the learning process. They either lack theoretical guarantees of the policy during iteration or suffer from infeasibility problems. To address this issue, we propose an indirect safe RL method called feasible policy iteration (FPI) that iteratively uses the feasible region of the last policy to constrain the current policy. The feasible region is represented by a feasibility function called constraint decay function (CDF). The core of FPI is a region-wise policy update rule called feasible policy improvement, which maximizes the return under the constraint of the CDF inside the feasible region and minimizes the CDF outside the feasible region. This update rule is always feasible and ensures that the feasible region monotonically expands and the state-value function monotonically increases inside the feasible region. Using the feasible Bellman equation, we prove that FPI converges to the maximum feasible region and the optimal state-value function. Experiments on classic control tasks and Safety Gym show that our algorithms achieve lower constraint violations and comparable or higher performance than the baselines.

READ FULL TEXT

page 8

page 9

research
09/13/2023

Safe Reinforcement Learning with Dual Robustness

Reinforcement learning (RL) agents are vulnerable to adversarial disturb...
research
05/16/2022

Reachability Constrained Reinforcement Learning

Constrained reinforcement learning (CRL) has gained significant interest...
research
05/22/2021

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

The safety constraints commonly used by existing safe reinforcement lear...
research
03/03/2020

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning (RL) is attracting increasing interests in autono...
research
05/23/2023

Constrained Proximal Policy Optimization

The problem of constrained reinforcement learning (CRL) holds significan...
research
10/23/2017

Stability Analysis of Optimal Adaptive Control using Value Iteration with Approximation Errors

Adaptive optimal control using value iteration initiated from a stabiliz...
research
12/17/2014

Stabilizing Value Iteration with and without Approximation Errors

Adaptive optimal control using value iteration (VI) initiated from a sta...

Please sign up or login with your details

Forgot password? Click here to reset