Is High Variance Unavoidable in RL? A Case Study in Continuous Control

10/21/2021
by   Johan Bjorck, et al.
4

Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an in-depth analysis, we focus on a specifically popular setup with high variance – continuous control from pixels with an actor-critic agent. In this setting, we demonstrate that variance mostly arises early in training as a result of poor "outlier" runs, but that weight initialization and initial exploration are not to blame. We show that one cause for early variance is numerical instability which leads to saturating nonlinearities. We investigate several fixes to this issue and find that one particular method is surprisingly effective and simple – normalizing penultimate features. Addressing the learning instability allows for larger learning rates, and significantly decreases the variance of outcomes. This demonstrates that the perceived variance in RL is not necessarily inherent to the problem definition and may be addressed through simple architectural modifications.

READ FULL TEXT

page 3

page 15

page 18

research
04/20/2022

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Although Reinforcement Learning (RL) is effective for sequential decisio...
research
06/14/2022

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

Policy-gradient methods in Reinforcement Learning(RL) are very universal...
research
09/18/2019

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Recent successes of Reinforcement Learning (RL) allow an agent to learn ...
research
11/30/2022

Efficient Reinforcement Learning (ERL): Targeted Exploration Through Action Saturation

Reinforcement Learning (RL) generally suffers from poor sample complexit...
research
06/16/2021

Offline RL Without Off-Policy Evaluation

Most prior approaches to offline reinforcement learning (RL) have taken ...
research
06/02/2021

Towards Deeper Deep Reinforcement Learning

In computer vision and natural language processing, innovations in model...
research
11/21/2020

On the Convergence of Reinforcement Learning

We consider the problem of Reinforcement Learning for nonlinear stochast...

Please sign up or login with your details

Forgot password? Click here to reset