DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

12/09/2021
by   Aviral Kumar, et al.
0

Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions with excessive "aliasing", in stark contrast to the supervised learning case. We back up these findings empirically, showing that feature representations learned by a deep network value function trained via bootstrapping can indeed become degenerate, aliasing the representations for state-action pairs that appear on either side of the Bellman backup. To address this issue, we derive the form of this implicit regularizer and, inspired by this derivation, propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer. When combined with existing offline RL methods, DR3 substantially improves performance and stability, alleviating unlearning in Atari 2600 games, D4RL domains and robotic manipulation from images.

READ FULL TEXT
research
10/21/2022

Implicit Offline Reinforcement Learning via Supervised Learning

Offline Reinforcement Learning (RL) via Supervised Learning is a simple ...
research
10/21/2022

Bridging the Gap Between Target Networks and Functional Regularization

Bootstrapping is behind much of the successes of Deep Reinforcement Lear...
research
10/27/2020

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

We identify an implicit under-parameterization phenomenon in value-based...
research
07/05/2022

An Empirical Study of Implicit Regularization in Deep Offline RL

Deep neural networks are the most commonly used function approximators i...
research
07/04/2020

Discount Factor as a Regularizer in Reinforcement Learning

Specifying a Reinforcement Learning (RL) task involves choosing a suitab...
research
11/25/2017

Learning Less-Overlapping Representations

In representation learning (RL), how to make the learned representations...
research
09/29/2018

Generalization and Regularization in DQN

Deep reinforcement learning (RL) algorithms have shown an impressive abi...

Please sign up or login with your details

Forgot password? Click here to reset