Penalized Proximal Policy Optimization for Safe Reinforcement Learning

05/24/2022
by   Linrui zhang, et al.
16

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the proposed method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which are less studied in previous work. Extensive experiments show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.

READ FULL TEXT
research
11/11/2020

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

Safe reinforcement learning (SRL) problems are typically modeled as cons...
research
10/21/2019

IPO: Interior-point Policy Optimization under Constraints

In this paper, we study reinforcement learning (RL) algorithms to solve ...
research
07/02/2018

Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization

This paper proposes a first order gradient reinforcement learning algori...
research
05/26/2023

Discrete-choice Multi-agent Optimization: Decentralized Hard Constraint Satisfaction for Smart Cities

Making Smart Cities more sustainable, resilient and democratic is emergi...
research
05/23/2023

Constrained Proximal Policy Optimization

The problem of constrained reinforcement learning (CRL) holds significan...
research
12/16/2018

A Logarithmic Barrier Method For Proximal Policy Optimization

Proximal policy optimization(PPO) has been proposed as a first-order opt...
research
06/19/2020

Set-Invariant Constrained Reinforcement Learning with a Meta-Optimizer

This paper investigates reinforcement learning with safety constraints. ...

Please sign up or login with your details

Forgot password? Click here to reset