Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

06/29/2023
by   Qiang He, et al.
0

We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides, it shows significant advantages in Q-value approximation and variance reduction. Our code is available at https://sites.google.com/view/erc-ecml23/.

READ FULL TEXT
research
06/05/2020

State Action Separable Reinforcement Learning

Reinforcement Learning (RL) based methods have seen their paramount succ...
research
04/22/2022

Analysis of Temporal Difference Learning: Linear System Approach

The goal of this technical note is to introduce a new finite-time conver...
research
01/15/2020

Lipschitz Lifelong Reinforcement Learning

We consider the problem of knowledge transfer when an agent is facing a ...
research
02/14/2022

Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods

Value-based methods play a fundamental role in Markov decision processes...
research
12/28/2018

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a centra...
research
09/16/2022

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

In temporal-difference reinforcement learning algorithms, variance in va...
research
10/28/2021

Cooperative Deep Q-learning Framework for Environments Providing Image Feedback

In this paper, we address two key challenges in deep reinforcement learn...

Please sign up or login with your details

Forgot password? Click here to reset