Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

05/23/2023
by   Oswin So, et al.
0

Tasks for autonomous robotic systems commonly require stabilization to a desired region while maintaining safety specifications. However, solving this multi-objective problem is challenging when the dynamics are nonlinear and high-dimensional, as traditional methods do not scale well and are often limited to specific problem structures. To address this issue, we propose a novel approach to solve the stabilize-avoid problem via the solution of an infinite-horizon constrained optimal control problem (OCP). We transform the constrained OCP into epigraph form and obtain a two-stage optimization problem that optimizes over the policy in the inner problem and over an auxiliary variable in the outer problem. We then propose a new method for this formulation that combines an on-policy deep reinforcement learning algorithm with neural network regression. Our method yields better stability during training, avoids instabilities caused by saddle-point finding, and is not restricted to specific requirements on the problem structure compared to more traditional methods. We validate our approach on different benchmark tasks, ranging from low-dimensional toy examples to an F16 fighter jet with a 17-dimensional state space. Simulation results show that our approach consistently yields controllers that match or exceed the safety of existing methods while providing ten-fold increases in stability performance from larger regions of attraction.

READ FULL TEXT

page 1

page 8

page 10

research
12/23/2021

Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

Reach-avoid optimal control problems, in which the system must reach cer...
research
11/25/2019

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control

Deep reinforcement learning for high dimensional, hierarchical control t...
research
09/27/2021

Solving Challenging Control Problems Using Two-Staged Deep Reinforcement Learning

We present a two-staged deep reinforcement learning algorithm for solvin...
research
09/10/2021

Data Generation Method for Learning a Low-dimensional Safe Region in Safe Reinforcement Learning

Safe reinforcement learning aims to learn a control policy while ensurin...
research
09/03/2018

A Minimum Discounted Reward Hamilton-Jacobi Formulation for Computing Reachable Sets

We propose a novel formulation for approximating reachable sets through ...
research
04/11/2023

Neural Network Approach to Portfolio Optimization with Leverage Constraints:a Case Study on High Inflation Investment

Motivated by the current global high inflation scenario, we aim to disco...
research
07/29/2023

Dimensionless Policies based on the Buckingham π Theorem: Is it a good way to Generalize Numerical Results?

Yes if the context, the list of variables defining the motion control pr...

Please sign up or login with your details

Forgot password? Click here to reset