A comment on stabilizing reinforcement learning

11/24/2021
by   Pavel Osinenko, et al.
0

This is a short comment on the paper "Asymptotically Stable Adaptive-Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation" by Vamvoudakis et al. The question of stability of reinforcement learning (RL) agents remains hard and the said work suggested an on-policy approach with a suitable stability property using a technique from adaptive control - a robustifying term to be added to the action. However, there is an issue with this approach to stabilizing RL, which we will explain in this note. Furthermore, Vamvoudakis et al. seems to have made a fallacious assumption on the Hamiltonian under a generic policy. To provide a positive result, we will not only indicate this mistake, but show critic neural network weight convergence under a stochastic, continuous-time environment, provided certain conditions on the behavior policy hold.

READ FULL TEXT

page 1

page 2

page 3

research
04/25/2019

Continuous-Time Mean-Variance Portfolio Optimization via Reinforcement Learning

We consider continuous-time Mean-variance (MV) portfolio optimization pr...
research
07/02/2022

q-Learning in Continuous Time

We study the continuous-time counterpart of Q-learning for reinforcement...
research
01/28/2019

Making Deep Q-learning methods robust to time discretization

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not r...
research
07/18/2023

Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

Continuous-time nonlinear optimal control problems hold great promise in...
research
02/17/2020

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

The choice of the control frequency of a system has a relevant impact on...
research
04/22/2020

Stability-Guaranteed Reinforcement Learning for Contact-rich Manipulation

Reinforcement learning (RL) has had its fair share of success in contact...
research
12/18/2019

Distributional Reinforcement Learning for Energy-Based Sequential Models

Global Autoregressive Models (GAMs) are a recent proposal [Parshakova et...

Please sign up or login with your details

Forgot password? Click here to reset