Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

10/17/2021
by   Ke Sun, et al.
10

Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2019

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Model-free deep reinforcement learning (RL) algorithms have been widely ...
research
09/25/2018

Anderson Acceleration for Reinforcement Learning

Anderson acceleration is an old and simple method for accelerating the c...
research
10/07/2021

Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
11/01/2019

Generalized Speedy Q-learning

In this paper, we derive a generalization of the Speedy Q-learning (SQL)...
research
09/07/2018

A Fast Anderson-Chebyshev Mixing Method for Nonlinear Optimization

Anderson mixing (or Anderson acceleration) is an efficient acceleration ...
research
07/04/2020

Discount Factor as a Regularizer in Reinforcement Learning

Specifying a Reinforcement Learning (RL) task involves choosing a suitab...
research
08/17/2018

Importance mixing: Improving sample reuse in evolutionary policy search methods

Deep neuroevolution, that is evolutionary policy search methods based on...

Please sign up or login with your details

Forgot password? Click here to reset