Representations for Stable Off-Policy Reinforcement Learning

07/10/2020
by   Dibya Ghosh, et al.
14

Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates. In deep reinforcement learning, these issues have been dealt with empirically by adapting and regularizing the representation, in particular with auxiliary tasks. This suggests that representation learning may provide a means to guarantee stability. In this paper, we formally show that there are indeed nontrivial state representations under which the canonical TD algorithm is stable, even when learning off-policy. We analyze representation learning schemes that are based on the transition matrix of a policy, such as proto-value functions, along three axes: approximation error, stability, and ease of estimation. In the most general case, we show that a Schur basis provides convergence guarantees, but is difficult to estimate from samples. For a fixed reward function, we find that an orthogonal basis of the corresponding Krylov subspace is an even better choice. We conclude by empirically demonstrating that these stable representations can be learned using stochastic gradient descent, opening the door to improved techniques for representation learning with deep networks.

READ FULL TEXT
research
04/25/2023

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Auxiliary tasks improve the representations learned by deep reinforcemen...
research
03/30/2022

Investigating the Properties of Neural Network Representations in Reinforcement Learning

In this paper we investigate the properties of representations learned b...
research
06/21/2018

Deep Orthogonal Representations: Fundamental Properties and Applications

Several representation learning and, more broadly, dimensionality reduct...
research
10/27/2021

Towards Robust Bisimulation Metric Learning

Learned representations in deep reinforcement learning (DRL) have to ext...
research
02/14/2019

CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

Off-policy Temporal Difference (TD) learning methods, when combined with...
research
05/29/2023

Towards a Better Understanding of Representation Dynamics under TD-learning

TD-learning is a foundation reinforcement learning (RL) algorithm for va...
research
07/05/2023

Stability of Q-Learning Through Design and Optimism

Q-learning has become an important part of the reinforcement learning to...

Please sign up or login with your details

Forgot password? Click here to reset