On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

02/03/2021
by   Gellért Weisz, et al.
0

We consider the problem of local planning in fixed-horizon Markov Decision Processes (MDPs) with a generative model under the assumption that the optimal value function lies in the span of a feature map that is accessible through the generative model. As opposed to previous work where linear realizability of all policies was assumed, we consider the significantly relaxed assumption of a single linearly realizable (deterministic) policy. A recent lower bound established that the related problem when the action-value function of the optimal policy is linearly realizable requires an exponential number of queries, either in H (the horizon of the MDP) or d (the dimension of the feature mapping). Their construction crucially relies on having an exponentially large action set. In contrast, in this work, we establish that poly(H, d) learning is possible (with state value function realizability) whenever the action set is small (i.e. O(1)). In particular, we present the TensorPlan algorithm which uses poly((dH/δ)^A) queries to find a δ-optimal policy relative to any deterministic policy for which the value function is linearly realizable with a parameter from a fixed radius ball around zero. This is the first algorithm to give a polynomial query complexity guarantee using only linear-realizability of a single competing value function. Whether the computation cost is similarly bounded remains an interesting open question. The upper bound is complemented by a lower bound which proves that in the infinite-horizon episodic setting, planners that achieve constant suboptimality need exponentially many queries, either in the dimension or the number of actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2020

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

We consider the problem of local planning in fixed-horizon Markov Decisi...
research
10/05/2021

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

We consider the minimax query complexity of online planning with a gener...
research
07/13/2020

Efficient Planning in Large MDPs with Weak Linear Function Approximation

Large-scale Markov decision processes (MDPs) require planning algorithms...
research
09/23/2020

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Constrained Markov Decision Processes (CMDPs) formalize sequential decis...
research
07/18/2022

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

The current paper studies sample-efficient Reinforcement Learning (RL) i...
research
10/21/2022

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

We propose a new stochastic primal-dual optimization algorithm for plann...
research
08/12/2021

Efficient Local Planning with Linear Function Approximation

We study query and computationally efficient planning algorithms with li...

Please sign up or login with your details

Forgot password? Click here to reset