Confident Approximate Policy Iteration for Efficient Local Planning in q^π-realizable MDPs

10/27/2022
by   Gellért Weisz, et al.
0

We consider approximate dynamic programming in γ-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon H and the worst-case approximation error ϵ of the action-value functions of stationary policies. This improvement over API (whose error scales with H^2) comes at the price of an H-fold increase in memory cost. Unlike Scherrer and Lesner [2012], who recommended computing a non-stationary policy to achieve a similar improvement (with the same memory overhead), we are able to stick to stationary policies. This allows for our second contribution, the application of CAPI to planning with local access to a simulator and d-dimensional linear function approximation. As such, we design a planning algorithm that applies CAPI to obtain a sequence of policies with successively refined accuracies on a dynamically evolving set of states. The algorithm outputs an Õ(√(d)Hϵ)-optimal policy after issuing Õ(dH^4/ϵ^2) queries to the simulator, simultaneously achieving the optimal accuracy bound and the best known query complexity bound, while earlier algorithms in the literature achieve only one of them. This query complexity is shown to be tight in all parameters except H. These improvements come at the expense of a mild (polynomial) increase in memory and computational costs of both the algorithm and its output policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2013

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon sta...
research
08/12/2021

Efficient Local Planning with Linear Function Approximation

We study query and computationally efficient planning algorithms with li...
research
09/21/2021

Computing Complexity-aware Plans Using Kolmogorov Complexity

In this paper, we introduce complexity-aware planning for finite-horizon...
research
03/25/2012

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

We consider infinite-horizon γ-discounted Markov Decision Processes, for...
research
03/19/2023

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

Value iteration can find the optimal replenishment policy for a perishab...
research
10/03/2020

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

We consider the problem of local planning in fixed-horizon Markov Decisi...
research
10/04/2019

Approximate policy iteration using neural networks for storage problems

We consider the stochastic single node energy storage problem (SNES) and...

Please sign up or login with your details

Forgot password? Click here to reset