Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

06/11/2020 ∙ by Devavrat Shah, et al. ∙ 10

We consider the question of learning Q-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If Q-function is Lipschitz continuous, then the minimal sample complexity for estimating ϵ-optimal Q-function is known to scale as Ω(1/ϵ^d_1+d_2 +2) per classical non-parametric learning theory, where d_1 and d_2 denote the dimensions of the state and action spaces respectively. The Q-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of Q-functions parameterized by its "rank" r, which contains all Lipschitz Q-functions as r →∞. As our key contribution, we develop a simple, iterative learning algorithm that finds ϵ-optimal Q-function with sample complexity of O(1/ϵ^max(d_1, d_2)+2) when the optimal Q-function has low rank r and the discounting factor γ is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix in the ℓ_∞ sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms.



There are no comments yet.


page 42

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.