Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

by   Devavrat Shah, et al.

We consider the question of learning Q-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If Q-function is Lipschitz continuous, then the minimal sample complexity for estimating ϵ-optimal Q-function is known to scale as Ω(1/ϵ^d_1+d_2 +2) per classical non-parametric learning theory, where d_1 and d_2 denote the dimensions of the state and action spaces respectively. The Q-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of Q-functions parameterized by its "rank" r, which contains all Lipschitz Q-functions as r →∞. As our key contribution, we develop a simple, iterative learning algorithm that finds ϵ-optimal Q-function with sample complexity of O(1/ϵ^max(d_1, d_2)+2) when the optimal Q-function has low rank r and the discounting factor γ is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix in the ℓ_∞ sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms.


Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

The practicality of reinforcement learning algorithms has been limited d...

Uncertainty-aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

Value estimation is one key problem in Reinforcement Learning. Albeit ma...

Low-rank State-action Value-function Approximation

Value functions are central to Dynamic Programming and Reinforcement Lea...

A Universal Variance Reduction-Based Catalyst for Nonconvex Low-Rank Matrix Recovery

We propose a generic framework based on a new stochastic variance-reduce...

Learning Non-Parametric Basis Independent Models from Point Queries via Low-Rank Methods

We consider the problem of learning multi-ridge functions of the form f(...

Unifying Framework for Crowd-sourcing via Graphon Estimation

We consider the question of inferring true answers associated with tasks...

Harnessing Structures for Value-Based Planning and Reinforcement Learning

Value-based methods constitute a fundamental methodology in planning and...

Please sign up or login with your details

Forgot password? Click here to reset