Robust Losses for Learning Value Functions

05/17/2022
by   Andrew Patterson, et al.
0

Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clipping gradients, clipping rewards, rescaling rewards, or clipping errors. While these strategies appear to be related to robust losses – like the Huber loss – they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings. We characterize the solutions of the robust losses, providing insight into the problem settings where the robust losses define notably better solutions than the mean squared Bellman error. Finally, we show that the resulting gradient-based algorithms are more stable, for both prediction and control, with less sensitivity to meta-parameters.

READ FULL TEXT
research
04/28/2021

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Many reinforcement learning algorithms rely on value estimation. However...
research
06/21/2019

Adaptive Learning Rate Clipping Stabilizes Learning

Artificial neural network training with stochastic gradient descent can ...
research
06/15/2017

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we d...
research
06/16/2021

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Recent development of Deep Reinforcement Learning has demonstrated super...
research
02/10/2020

Statistically Efficient Off-Policy Policy Gradients

Policy gradient methods in reinforcement learning update policy paramete...
research
03/01/2022

On the impact of outliers in loss reserving

The sensitivity of loss reserving techniques to outliers in the data or ...
research
07/03/2022

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Off-policy reinforcement learning (RL) from pixel observations is notori...

Please sign up or login with your details

Forgot password? Click here to reset