Theory of Deep Q-Learning: A Dynamical Systems Perspective

08/25/2020
by   Arunselvan Ramaswamy, et al.
0

Deep Q-Learning is an important algorithm, used to solve sequential decision making problems. It involves training a Deep Neural Network, called a Deep Q-Network (DQN), to approximate a function associated with optimal decision making, the Q-function. Although wildly successful in laboratory conditions, serious gaps between theory and practice prevent its use in the real-world. In this paper, we present a comprehensive analysis of the popular and practical version of the algorithm, under realistic verifiable assumptions. An important contribution is the characterization of its performance as a function of training. To do this, we view the algorithm as an evolving dynamical system. This facilitates associating a closely-related measure process with training. Then, the long-term behavior of Deep Q-Learning is determined by the limit of the aforementioned measure process. Empirical inferences, such as the qualitative advantage of using experience replay, and performance inconsistencies even after training, are explained using our analysis. Also, our theory is general and accommodates state Markov processes with multiple stationary distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2019

Deep Learning via Dynamical Systems: An Approximation Perspective

We build on the dynamical systems approach to deep learning, where deep ...
research
05/25/2023

Koopman Kernel Regression

Many machine learning approaches for decision making, such as reinforcem...
research
11/26/2020

Spectral Analysis and Stability of Deep Neural Dynamics

Our modern history of deep learning follows the arc of famous emergent d...
research
06/03/2020

Optimizing Neural Networks via Koopman Operator Theory

Koopman operator theory, a powerful framework for discovering the underl...
research
01/28/2022

Adversarial Decisions on Complex Dynamical Systems using Game Theory

We apply computational Game Theory to a unification of physics-based mod...
research
02/18/2021

Optimising Long-Term Outcomes using Real-World Fluent Objectives: An Application to Football

In this paper, we present a novel approach for optimising long-term tact...
research
05/06/2021

KuraNet: Systems of Coupled Oscillators that Learn to Synchronize

Networks of coupled oscillators are some of the most studied objects in ...

Please sign up or login with your details

Forgot password? Click here to reset