DeepAI AI Chat
Log In Sign Up

Beyond Target Networks: Improving Deep Q-learning with Functional Regularization

06/04/2021
by   Alexandre Piché, et al.
0

Target networks are at the core of recent success in Reinforcement Learning. They stabilize the training by using old parameters to estimate the Q-values, but this also limits the propagation of newly-encountered rewards which could ultimately slow down the training. In this work, we propose an alternative training method based on functional regularization which does not have this deficiency. Unlike target networks, our method uses up-to-date parameters to estimate the target Q-values, thereby speeding up training while maintaining stability. Surprisingly, in some cases, we can show that target networks are a special, restricted type of functional regularizers. Using this approach, we show empirical improvements in sample efficiency and performance across a range of Atari and simulated robotics environments.

READ FULL TEXT

page 13

page 14

10/21/2022

Bridging the Gap Between Target Networks and Functional Regularization

Bootstrapping is behind much of the successes of Deep Reinforcement Lear...
06/26/2020

Transfer Learning via ℓ_1 Regularization

Machine learning algorithms typically require abundant data under a stat...
10/09/2019

Ctrl-Z: Recovering from Instability in Reinforcement Learning

When learning behavior, training data is often generated by the learner ...
04/14/2021

Learning Regularization Parameters of Inverse Problems via Deep Neural Networks

In this work, we describe a new approach that uses deep neural networks ...
11/21/2019

Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control

Deep networks have enabled reinforcement learning to scale to more compl...
02/23/2020

Periodic Q-Learning

The use of target networks is a common practice in deep reinforcement le...
12/06/2021

Functional Regularization for Reinforcement Learning via Learned Fourier Features

We propose a simple architecture for deep reinforcement learning by embe...