Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

06/16/2020
by   Qingtao Zhao, et al.
0

In this paper time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. Among existing approximate dynamic programming (ADP) and reinforcement learning (RL) algorithms, the direct heuristic dynamic programming (dHDP) has been shown an effective tool as demonstrated in solving several complex learning control problems. It continuously updates the control policy and the critic as system states continuously evolve. It is therefore desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. Toward this goal, we propose a new event-driven dHDP. By constructing a Lyapunov function candidate, we prove the uniformly ultimately boundedness (UUB) of the system states and the weights in the critic and the control policy networks. Consequently we show the approximate control and cost-to-go function approaching Bellman optimality within a finite bound. We also illustrate how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.

READ FULL TEXT
research
06/26/2019

Approximate Dynamic Programming For Linear Systems with State and Input Constraints

Enforcing state and input constraints during reinforcement learning (RL)...
research
10/30/2017

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value it...
research
01/22/2021

Robotic Knee Tracking Control to Mimic the Intact Human Knee Profile Based on Actor-critic Reinforcement Learning

We address a state-of-the-art reinforcement learning (RL) control approa...
research
08/15/2013

Complete stability analysis of a heuristic ADP control design

This paper provides new stability results for Action-Dependent Heuristic...
research
02/27/2020

Sub-Goal Trees – a Framework for Goal-Based Reinforcement Learning

Many AI problems, in robotics and other domains, are goal-based, essenti...
research
12/11/2019

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

We combine Adaptive Dynamic Programming (ADP), a reinforcement learning ...
research
06/21/2019

Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning

Stochastic control with both inherent random system noise and lack of kn...

Please sign up or login with your details

Forgot password? Click here to reset