Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls

10/27/2020
by   Jeongho Kim, et al.
0

In this paper, we propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls. Our method is based on a new class of Hamilton-Jacobi-Bellman (HJB) equations derived from applying the dynamic programming principle to continuous-time Q-functions. A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics. We identify the condition under which the Q-function estimated by this algorithm converges to the optimal Q-function. For practical implementation, we propose the Hamilton-Jacobi DQN, which extends the idea of deep Q-networks (DQN) to our continuous control setting. This approach does not require actor networks or numerical solutions to optimization problems for greedy actions since the HJB equation provides a simple characterization of optimal controls via ordinary differential equations. We empirically demonstrate the performance of our method through benchmark tasks and high-dimensional linear-quadratic problems.

READ FULL TEXT
research
12/23/2019

Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time

In this paper, we introduce Hamilton-Jacobi-Bellman (HJB) equations for ...
research
01/18/2022

Convergence of a robust deep FBSDE method for stochastic control

In this paper we propose a deep learning based numerical scheme for stro...
research
09/17/2020

Regularity and time discretization of extended mean field control problems: a McKean-Vlasov FBSDE approach

We analyze the solution regularity and discrete-time approximations of e...
research
04/19/2021

Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls

We study finite-time horizon continuous-time linear-convex reinforcement...
research
04/27/2022

Accelerated Continuous-Time Approximate Dynamic Programming via Data-Assisted Hybrid Control

We introduce a new closed-loop architecture for the online solution of a...
research
07/05/2017

Machine Learning, Deepest Learning: Statistical Data Assimilation Problems

We formulate a strong equivalence between machine learning, artificial i...
research
03/16/2022

Multiscale Sensor Fusion and Continuous Control with Neural CDEs

Though robot learning is often formulated in terms of discrete-time Mark...

Please sign up or login with your details

Forgot password? Click here to reset