CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

02/14/2019
by   Aditya Bhatt, et al.
0

Off-policy Temporal Difference (TD) learning methods, when combined with function approximators, suffer from the risk of divergence, a phenomenon known as the deadly triad. It has long been noted that some feature representations work better than others. In this paper we investigate how feature normalization can prevent divergence and improve training. Our method, which we call CrossNorm, can be regarded as a new variant of batch normalization that re-centers data for multi-modal distributions, which occur in the off-policy TD updates. We show empirically that CrossNorm improves the stability of the learning process. We apply CrossNorm to DDPG and TD3 and achieve stable training and improved performance across a range of MuJoCo benchmark tasks. Moreover, for the first time, we are able to train DDPG stably without the use of target networks.

READ FULL TEXT

page 4

page 5

page 9

research
02/24/2023

On the Training Instability of Shuffling SGD with Batch Normalization

We uncover how SGD interacts with batch normalization and can exhibit un...
research
03/09/2020

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization...
research
06/06/2020

Stable and Efficient Policy Evaluation

Policy evaluation algorithms are essential to reinforcement learning due...
research
07/10/2020

Representations for Stable Off-Policy Reinforcement Learning

Reinforcement learning with function approximation can be unstable and e...
research
10/12/2018

Mode Normalization

Normalization methods are a central building block in the deep learning ...
research
06/06/2021

Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization

Visual pattern recognition over agricultural areas is an important appli...
research
01/27/2019

Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift

In this paper we revisit the method of off-policy corrections for reinfo...

Please sign up or login with your details

Forgot password? Click here to reset