DeepAI AI Chat
Log In Sign Up

Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints

by   Jingliang Duan, et al.

This paper presents a constrained deep adaptive dynamic programming (CDADP) algorithm to solve general nonlinear optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Both the policy and value function are approximated by deep neural networks (NNs), which directly map the system state to action and value function respectively without needing to use hand-crafted basis function. The proposed algorithm considers the state constraints by transforming the policy improvement process to a constrained optimization problem. Meanwhile, a trust region constraint is added to prevent excessive policy update. We first linearize this constrained optimization problem locally into a quadratically-constrained quadratic programming problem, and then obtain the optimal update of policy network parameters by solving its dual problem. We also propose a series of recovery rules to update the policy in case the primal problem is infeasible. In addition, parallel learners are employed to explore different state spaces and then stabilize and accelerate the learning speed. The vehicle control problem in path-tracking task is used to demonstrate the effectiveness of this proposed method.


Generalized Policy Iteration for Optimal Control in Continuous Time

This paper proposes the Deep Generalized Policy Iteration (DGPI) algorit...

Constrained Differential Dynamic Programming: A primal-dual augmented Lagrangian approach

Trajectory optimization is an efficient approach for solving optimal con...

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning (RL) is attracting increasing interests in autono...

A tree structure algorithm for optimal control problems with state constraints

We present a tree structure algorithm for optimal control problems with ...

Ternary Policy Iteration Algorithm for Nonlinear Robust Control

The uncertainties in plant dynamics remain a challenge for nonlinear con...

Dynamic penalty function approach for constraints handling in reinforcement learning

Reinforcement learning (RL) is attracting attentions as an effective way...

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

This paper presents a safety-aware learning framework that employs an ad...