Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints

11/26/2019
by   Jingliang Duan, et al.
0

This paper presents a constrained deep adaptive dynamic programming (CDADP) algorithm to solve general nonlinear optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Both the policy and value function are approximated by deep neural networks (NNs), which directly map the system state to action and value function respectively without needing to use hand-crafted basis function. The proposed algorithm considers the state constraints by transforming the policy improvement process to a constrained optimization problem. Meanwhile, a trust region constraint is added to prevent excessive policy update. We first linearize this constrained optimization problem locally into a quadratically-constrained quadratic programming problem, and then obtain the optimal update of policy network parameters by solving its dual problem. We also propose a series of recovery rules to update the policy in case the primal problem is infeasible. In addition, parallel learners are employed to explore different state spaces and then stabilize and accelerate the learning speed. The vehicle control problem in path-tracking task is used to demonstrate the effectiveness of this proposed method.

READ FULL TEXT
research
09/11/2019

Generalized Policy Iteration for Optimal Control in Continuous Time

This paper proposes the Deep Generalized Policy Iteration (DGPI) algorit...
research
10/27/2022

Constrained Differential Dynamic Programming: A primal-dual augmented Lagrangian approach

Trajectory optimization is an efficient approach for solving optimal con...
research
03/03/2020

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning (RL) is attracting increasing interests in autono...
research
06/26/2023

Beyond dynamic programming

In this paper, we present Score-life programming, a novel theoretical ap...
research
07/14/2020

Ternary Policy Iteration Algorithm for Nonlinear Robust Control

The uncertainties in plant dynamics remain a challenge for nonlinear con...
research
09/14/2023

A Unified Perspective on Multiple Shooting In Differential Dynamic Programming

Differential Dynamic Programming (DDP) is an efficient computational too...

Please sign up or login with your details

Forgot password? Click here to reset