
An Analysis of Categorical Distributional Reinforcement Learning
Distributional approaches to valuebased reinforcement learning model th...
read it

CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq
Reinforcement learning algorithms solve sequential decisionmaking probl...
read it

Distributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by tak...
read it

Breaking the Deadly Triad with a Target Network
The deadly triad refers to the instability of a reinforcement learning a...
read it

Distributional reinforcement learning with linear function approximation
Despite many algorithmic advances, our theoretical understanding of prac...
read it

Reinforcement Learning with Dynamic Boltzmann Softmax Updates
Value function estimation is an important task in reinforcement learning...
read it

Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence
This paper develops a unified framework, based on iterated random operat...
read it
A Distributional Analysis of SamplingBased Reinforcement Learning Algorithms
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant stepsizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonlyused methods. We show that valuebased methods such as TD(λ) and QLearning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the stepsize shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.
READ FULL TEXT
Comments
There are no comments yet.