A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

03/27/2020
by   Philip Amortila, et al.
6

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(λ) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2018

An Analysis of Categorical Distributional Reinforcement Learning

Distributional approaches to value-based reinforcement learning model th...
research
08/05/2022

Sample Complexity of Policy-Based Methods under Off-Policy Sampling and Linear Function Approximation

In this work, we study policy-based methods for solving the reinforcemen...
research
02/08/2019

Distributional reinforcement learning with linear function approximation

Despite many algorithmic advances, our theoretical understanding of prac...
research
08/28/2022

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Learning a predictive model of the mean return, or value function, plays...
research
02/22/2023

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

The classical algorithms used in tabular reinforcement learning (Value I...
research
01/21/2021

Breaking the Deadly Triad with a Target Network

The deadly triad refers to the instability of a reinforcement learning a...
research
11/23/2021

Schedule Based Temporal Difference Algorithms

Learning the value function of a given policy from data samples is an im...

Please sign up or login with your details

Forgot password? Click here to reset