An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

03/23/2021
by   Yuanhao Wang, et al.
0

A fundamental question in the theory of reinforcement learning is: suppose the optimal Q-function lies in the linear span of a given d dimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of Weisz et al. (2020) resolved this question in the negative, providing an exponential (in d) sample size lower bound, which holds even if the agent has access to a generative model of the environment. One may hope that this information theoretic barrier for RL can be circumvented by further supposing an even more favorable assumption: there exists a constant suboptimality gap between the optimal Q-value of the best action and that of the second-best action (for all states). The hope is that having a large suboptimality gap would permit easier identification of optimal actions themselves, thus making the problem tractable; indeed, provided the agent has access to a generative model, sample-efficient RL is in fact possible with the addition of this more favorable assumption. This work focuses on this question in the standard online reinforcement learning setting, where our main result resolves this question in the negative: our hardness result shows that an exponential sample complexity lower bound still holds even if a constant suboptimality gap is assumed in addition to having a linearly realizable optimal Q-function. Perhaps surprisingly, this implies an exponential separation between the online RL setting and the generative model setting. Complementing our negative hardness result, we give two positive results showing that provably sample-efficient RL is possible either under an additional low-variance assumption or under a novel hypercontractivity assumption (both implicitly place stronger conditions on the underlying dynamics model).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2021

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Low-complexity models such as linear function representation play a pivo...
research
11/14/2022

Linear Reinforcement Learning with Ball Structure Action Space

We study the problem of Reinforcement Learning (RL) with linear function...
research
10/05/2021

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

We consider the minimax query complexity of online planning with a gener...
research
07/13/2020

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

A common assumption in reinforcement learning (RL) is to have access to ...
research
11/02/2020

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

Recently, Wang et al. (2020) showed a highly intriguing hardness result ...
research
07/18/2022

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

The current paper studies sample-efficient Reinforcement Learning (RL) i...
research
03/29/2023

Does Sparsity Help in Learning Misspecified Linear Bandits?

Recently, the study of linear misspecified bandits has generated intrigu...

Please sign up or login with your details

Forgot password? Click here to reset