Learning Dynamics and Generalization in Reinforcement Learning

06/05/2022
by   Clare Lyle, et al.
0

Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly initialized networks and networks trained with policy gradient methods. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization to novel environments in the ProcGen suite and improves robustness to input perturbations.

READ FULL TEXT

page 5

page 19

page 23

research
06/16/2020

Parameter-based Value Functions

Learning value functions off-policy is at the core of modern Reinforceme...
research
10/03/2018

Comparison of Reinforcement Learning algorithms applied to the Cart Pole problem

Designing optimal controllers continues to be challenging as systems are...
research
03/13/2020

Interference and Generalization in Temporal Difference Learning

We study the link between generalization and interference in temporal-di...
research
04/20/2022

Understanding and Preventing Capacity Loss in Reinforcement Learning

The reinforcement learning (RL) problem is rife with sources of non-stat...
research
10/18/2022

Rethinking Value Function Learning for Generalization in Reinforcement Learning

We focus on the problem of training RL agents on multiple training envir...
research
07/10/2023

Dynamics of Temporal Difference Reinforcement Learning

Reinforcement learning has been successful across several applications i...
research
05/11/2021

Return-based Scaling: Yet Another Normalisation Trick for Deep RL

Scaling issues are mundane yet irritating for practitioners of reinforce...

Please sign up or login with your details

Forgot password? Click here to reset