Projection-Based Constrained Policy Optimization

10/07/2020
by   Tsung-Yen Yang, et al.
0

We consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm, Projection-Based Constrained Policy Optimization (PCPO). This is an iterative method for optimizing policies in a two-step process: the first step performs a local reward improvement update, while the second step reconciles any constraint violation by projecting the policy back onto the constraint set. We theoretically analyze PCPO and provide a lower bound on reward improvement, and an upper bound on constraint violation, for each policy update. We further characterize the convergence of PCPO based on two different metrics: norm and Kullback-Leibler divergence. Our empirical results over several control tasks demonstrate that PCPO achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.

READ FULL TEXT

page 7

page 24

research
06/20/2020

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

We consider the problem of reinforcement learning when provided with a b...
research
11/11/2020

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

Safe reinforcement learning (SRL) problems are typically modeled as cons...
research
06/21/2023

Inverse Constraint Learning and Generalization by Transferable Reward Decomposition

We present the problem of inverse constraint learning (ICL), which recov...
research
02/16/2020

First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning

In reinforcement learning, an agent attempts to learn high-performing be...
research
05/28/2018

Reward Constrained Policy Optimization

Teaching agents to perform tasks using Reinforcement Learning is no easy...
research
02/22/2020

Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion

Deep reinforcement learning (RL) uses model-free techniques to optimize ...
research
02/29/2016

Easy Monotonic Policy Iteration

A key problem in reinforcement learning for control with general functio...

Please sign up or login with your details

Forgot password? Click here to reset