Set-Invariant Constrained Reinforcement Learning with a Meta-Optimizer

06/19/2020
by   Chuangchuang Sun, et al.
0

This paper investigates reinforcement learning with safety constraints. To drive the constraint violation monotonically decrease, the constraints are taken as Lyapunov functions, and new linear constraints are imposed on the updating dynamics of the policy parameters such that the original safety set is forward-invariant in expectation. As the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a neural network-based meta-optimizer to optimize the objective while satisfying such linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. Eventually, the meta-optimizer trains the policy network to monotonically decrease the constraint violation and maximize the cumulative reward. Numerical results validate the theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2020

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

We consider the problem of reinforcement learning when provided with a b...
research
05/16/2022

Reachability Constrained Reinforcement Learning

Constrained reinforcement learning (CRL) has gained significant interest...
research
10/21/2019

IPO: Interior-point Policy Optimization under Constraints

In this paper, we study reinforcement learning (RL) algorithms to solve ...
research
05/24/2022

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satis...
research
11/09/2019

Learning to Optimize in Swarms

Learning to optimize has emerged as a powerful framework for various opt...
research
06/30/2020

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Learning-to-learn (using optimization algorithms to learn a new optimize...
research
03/06/2020

Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization

Many current behavior generation methods struggle to handle real-world t...

Please sign up or login with your details

Forgot password? Click here to reset