Harnessing Structures for Value-Based Planning and Reinforcement Learning

09/26/2019
by   Yuzhe Yang, et al.
0

Value-based methods constitute a fundamental methodology in planning and deep reinforcement learning (RL). In this paper, we propose to exploit the underlying structures of the state-action value function, i.e., Q function, for both planning and deep RL. In particular, if the underlying system dynamics lead to some global structures of the Q function, one should be capable of inferring the function better by leveraging such structures. Specifically, we investigate the lowrank structure, which widely exists for big data matrices. We verify empirically the existence of low-rank Q functions in the context of control and deep RL tasks (Atari games). As our key contribution, by leveraging Matrix Estimation (ME) techniques, we propose a general framework to exploit the underlying low-rank structure in Q functions, leading to a more efficient planning procedure for classical control, and additionally, a simple scheme that can be applied to any value-based RL techniques to consistently achieve better performance on "low-rank" tasks. Extensive experiments on control tasks and Atari games confirm the efficacy of our approach.

READ FULL TEXT
research
11/19/2021

Uncertainty-aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

Value estimation is one key problem in Reinforcement Learning. Albeit ma...
research
11/11/2020

Hamiltonian Q-Learning: Leveraging Importance-sampling for Data Efficient RL

Model-free reinforcement learning (RL), in particular Q-learning is wide...
research
04/18/2021

Low-rank State-action Value-function Approximation

Value functions are central to Dynamic Programming and Reinforcement Lea...
research
01/21/2022

Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning

Value-function (VF) approximation is a central problem in Reinforcement ...
research
06/11/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

We consider the question of learning Q-function in a sample efficient ma...
research
05/31/2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

The successes of deep Reinforcement Learning (RL) are limited to setting...
research
06/27/2012

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

SDYNA is a general framework designed to address large stochastic reinfo...

Please sign up or login with your details

Forgot password? Click here to reset