A Theoretical Analysis of Deep Q-Learning

01/01/2019
by   Zhuora Yang, et al.
0

Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood. In this work, we make the first attempt to theoretically understand the deep Q-network (DQN) algorithm (Mnih et al., 2015) from both algorithmic and statistical perspectives. In specific, we focus on a slight simplification of DQN that fully captures its key features. Under mild assumptions, we establish the algorithmic and statistical rates of convergence for the action-value functions of the iterative policy sequence obtained by DQN. In particular, the statistical error characterizes the bias and variance that arise from approximating the action-value function using deep neural network, while the algorithmic error converges to zero at a geometric rate. As a byproduct, our analysis provides justifications for the techniques of experience replay and target network, which are crucial to the empirical success of DQN. Furthermore, as a simple extension of DQN, we propose the Minimax-DQN algorithm for zero-sum Markov game with two players. Borrowing the analysis of DQN, we also quantify the difference between the policies obtained by Minimax-DQN and the Nash equilibrium of the Markov game in terms of both the algorithmic and statistical rates of convergence.

READ FULL TEXT
research
07/20/2020

Evolution toward a Nash equilibrium

In this paper, we study the dynamic behavior of Hedge, a well-known algo...
research
02/25/2020

On Reinforcement Learning for Turn-based Zero-sum Markov Games

We consider the problem of finding Nash equilibrium for two-player turn-...
research
11/29/2020

Minimax Sample Complexity for Turn-based Stochastic Game

The empirical success of Multi-agent reinforcement learning is encouragi...
research
04/25/2022

A dynamic that evolves toward a Nash equilibrium

In this paper, we study an exponentiated multiplicative weights dynamic ...
research
12/10/2019

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning...
research
03/06/2021

Zero-sum risk-sensitive continuous-time stochastic games with unbounded payoff and transition rates and Borel spaces

We study a finite-horizon two-person zero-sum risk-sensitive stochastic ...
research
10/28/2021

Cooperative Deep Q-learning Framework for Environments Providing Image Feedback

In this paper, we address two key challenges in deep reinforcement learn...

Please sign up or login with your details

Forgot password? Click here to reset