Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

07/03/2022
by   Edoardo Cetin, et al.
4

Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable. As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards. We show that this new visual deadly triad causes unstable training and premature convergence to degenerate solutions, a phenomenon we name catastrophic self-overfitting. Based on our analysis, we propose A-LIX, a method providing adaptive regularization to the encoder's gradients that explicitly prevents the occurrence of catastrophic self-overfitting using a dual objective. By applying A-LIX, we significantly outperform the prior state-of-the-art on the DeepMind Control and Atari 100k benchmarks without any data augmentation or auxiliary losses.

READ FULL TEXT

page 4

page 27

research
05/29/2023

RLAD: Reinforcement Learning from Pixels for Autonomous Driving in Urban Environments

Current approaches of Reinforcement Learning (RL) applied in urban Auton...
research
04/11/2022

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

Vision Transformers (ViT) have recently demonstrated the significant pot...
research
04/26/2023

CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

The safe application of reinforcement learning (RL) requires generalizat...
research
09/14/2020

Decoupling Representation Learning from Reinforcement Learning

In an effort to overcome limitations of reward-driven feature learning i...
research
06/20/2018

A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning

The risks and perils of overfitting in machine learning are well known. ...
research
06/04/2021

Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

A highly desirable property of a reinforcement learning (RL) agent – and...
research
05/17/2022

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are ba...

Please sign up or login with your details

Forgot password? Click here to reset