Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and Convex Q-Learning in Continuous Time

10/14/2022
by   Fan Lu, et al.
0

Convex Q-learning is a recent approach to reinforcement learning, motivated by the possibility of a firmer theory for convergence, and the possibility of making use of greater a priori knowledge regarding policy or value function structure. This paper explores algorithm design in the continuous time domain, with finite-horizon optimal control objective. The main contributions are (i) Algorithm design is based on a new Q-ODE, which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation. (ii) The Q-ODE motivates a new formulation of Convex Q-learning that avoids the approximations appearing in prior work. The Bellman error used in the algorithm is defined by filtered measurements, which is beneficial in the presence of measurement noise. (iii) A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting. (iv) The theory is illustrated in application to resource allocation for distributed energy resources, for which the theory is ideally suited.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2020

Model-free optimal control of discrete-time systems with additive and multiplicative noises

This paper investigates the optimal control problem for a class of discr...
research
05/09/2017

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Policy iteration (PI) is a recursive process of policy evaluation and im...
research
10/11/2014

Q-learning for Optimal Control of Continuous-time Systems

In this paper, two Q-learning (QL) methods are proposed and their conver...
research
09/10/2023

Convex Q Learning in a Stochastic Environment: Extended Version

The paper introduces the first formulation of convex Q-learning for Mark...
research
10/17/2022

Sufficient Exploration for Convex Q-learning

In recent years there has been a collective research effort to find new ...
research
08/08/2020

Convex Q-Learning, Part 1: Deterministic Optimal Control

It is well known that the extension of Watkins' algorithm to general fun...
research
02/18/2018

Estimating scale-invariant future in continuous time

Natural learners must compute an estimate of future outcomes that follow...

Please sign up or login with your details

Forgot password? Click here to reset