Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

10/27/2020
by   Aviral Kumar, et al.
0

We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We characterize this loss of expressivity in terms of a drop in the rank of the learned value network features, and show that this corresponds to a drop in performance. We demonstrate this phenomenon on widely studies domains, including Atari and Gym benchmarks, in both offline and online RL settings. We formally analyze this phenomenon and show that it results from a pathological interaction between bootstrapping and gradient-based optimization. We further show that mitigating implicit under-parameterization by controlling rank collapse improves performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2022

An Empirical Study of Implicit Regularization in Deep Offline RL

Deep neural networks are the most commonly used function approximators i...
research
12/09/2021

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Despite overparameterization, deep networks trained via supervised learn...
research
03/13/2023

Loss of Plasticity in Continual Deep Reinforcement Learning

The ability to learn continually is essential in a complex and changing ...
research
11/10/2022

Regression as Classification: Influence of Task Formulation on Neural Network Features

Neural networks can be trained to solve regression problems by using gra...
research
03/02/2023

Understanding plasticity in neural networks

Plasticity, the ability of a neural network to quickly change its predic...
research
07/09/2023

Investigating the Edge of Stability Phenomenon in Reinforcement Learning

Recent progress has been made in understanding optimisation dynamics in ...

Please sign up or login with your details

Forgot password? Click here to reset