Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

06/07/2022
by   Tyler Sam, et al.
0

The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an ϵ-optimal policy is Ω(|S||A|H^3 / ^2) over worst case instances of an MDP with state space S, action space A, and horizon H. We consider a class of MDPs that exhibit low rank structure, where the latent features are unknown. We argue that a natural combination of value iteration and low-rank matrix estimation results in an estimation error that grows doubly exponentially in the horizon H. We then provide a new algorithm along with statistical guarantees that efficiently exploits low rank structure given access to a generative model, achieving a sample complexity of O(d^5(|S|+|A|)poly(H)/^2) for a rank d setting, which is minimax optimal with respect to the scaling of |S|, |A|, and . In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal low-rank structural assumptions required on the MDP with respect to the transition kernel versus the optimal action-value function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2022

Multi-User Reinforcement Learning with Low Rank Rewards

In this work, we consider the problem of collaborative multi-user reinfo...
research
06/11/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

We consider the question of learning Q-function in a sample efficient ma...
research
11/26/2015

Incremental Truncated LSTD

Balancing between computational efficiency and sample efficiency is an i...
research
02/29/2020

Learning Near Optimal Policies with Low Inherent Bellman Error

We study the exploration problem with approximate linear action-value fu...
research
06/22/2021

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

There have been many recent advances on provably efficient Reinforcement...
research
11/14/2022

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Deep Q-learning based algorithms have been applied successfully in many ...
research
05/24/2023

Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure

We consider offline Reinforcement Learning (RL), where the agent does no...

Please sign up or login with your details

Forgot password? Click here to reset