Computational-Statistical Gaps in Reinforcement Learning

02/11/2022
by   Daniel Kane, et al.
14

Reinforcement learning with function approximation has recently achieved tremendous results in applications with large state spaces. This empirical success has motivated a growing body of theoretical work proposing necessary and sufficient conditions under which efficient reinforcement learning is possible. From this line of work, a remarkably simple minimal sufficient condition has emerged for sample efficient reinforcement learning: MDPs with optimal value function V^* and Q^* linear in some known low-dimensional features. In this setting, recent works have designed sample efficient algorithms which require a number of samples polynomial in the feature dimension and independent of the size of state space. They however leave finding computationally efficient algorithms as future work and this is considered a major open problem in the community. In this work, we make progress on this open problem by presenting the first computational lower bound for RL with linear function approximation: unless NP=RP, no randomized polynomial time algorithm exists for deterministic transition MDPs with a constant number of actions and linear optimal value functions. To prove this, we show a reduction from Unique-Sat, where we convert a CNF formula into an MDP with deterministic transitions, constant number of actions and low dimensional linear optimal value functions. This result also exhibits the first computational-statistical gap in reinforcement learning with linear function approximation, as the underlying statistical problem is information-theoretically solvable with a polynomial number of queries, but no computationally efficient algorithm exists unless NP=RP. Finally, we also prove a quasi-polynomial time lower bound under the Randomized Exponential Time Hypothesis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2023

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

A fundamental question in reinforcement learning theory is: suppose the ...
research
02/29/2020

Learning Near Optimal Policies with Low Inherent Bellman Error

We study the exploration problem with approximate linear action-value fu...
research
06/23/2022

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation where...
research
06/22/2023

Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Reinforcement learning often needs to deal with the exponential growth o...
research
06/24/2022

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

We study reinforcement learning with function approximation for large-sc...
research
07/12/2021

Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

Many reinforcement learning (RL) environments in practice feature enormo...
research
09/18/2023

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

The key assumption underlying linear Markov Decision Processes (MDPs) is...

Please sign up or login with your details

Forgot password? Click here to reset