Q-learning for Optimal Control of Continuous-time Systems

10/11/2014
by   Biao Luo, et al.
0

In this paper, two Q-learning (QL) methods are proposed and their convergence theories are established for addressing the model-free optimal control problem of general nonlinear continuous-time systems. By introducing the Q-function for continuous-time systems, policy iteration based QL (PIQL) and value iteration based QL (VIQL) algorithms are proposed for learning the optimal control policy from real system data rather than using mathematical system model. It is proved that both PIQL and VIQL methods generate a nonincreasing Q-function sequence, which converges to the optimal Q-function. For implementation of the QL algorithms, the method of weighted residuals is applied to derived the parameters update rule. The developed PIQL and VIQL algorithms are essentially off-policy reinforcement learning approachs, where the system data can be collected arbitrary and thus the exploration ability is increased. With the data collected from the real system, the QL methods learn the optimal control policy offline, and then the convergent control policy will be employed to real system. The effectiveness of the developed QL algorithms are verified through computer simulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2019

Generalized Policy Iteration for Optimal Control in Continuous Time

This paper proposes the Deep Generalized Policy Iteration (DGPI) algorit...
research
05/09/2017

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Policy iteration (PI) is a recursive process of policy evaluation and im...
research
09/10/2020

Analysis of Theoretical and Numerical Properties of Sequential Convex Programming for Continuous-Time Optimal Control

Through the years, Sequential Convex Programming (SCP) has gained great ...
research
03/31/2023

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

In this paper, an off-policy reinforcement learning algorithm is designe...
research
10/14/2022

Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and Convex Q-Learning in Continuous Time

Convex Q-learning is a recent approach to reinforcement learning, motiva...
research
10/19/2020

A Reinforcement Learning Approach to Health Aware Control Strategy

Health-aware control (HAC) has emerged as one of the domains where contr...
research
06/28/2023

Continuous-Time q-learning for McKean-Vlasov Control Problems

This paper studies the q-learning, recently coined as the continuous-tim...

Please sign up or login with your details

Forgot password? Click here to reset