The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

09/28/2021
by   Anna Winnicki, et al.
0

When the sizes of the state and action spaces are large, solving MDPs can be computationally prohibitive even if the probability transition matrix is known. So in practice, a number of techniques are used to approximately solve the dynamic programming problem, including lookahead, approximate policy evaluation using an m-step return, and function approximation. In a recent paper, (Efroni et al. 2019) studied the impact of lookahead on the convergence rate of approximate dynamic programming. In this paper, we show that these convergence results change dramatically when function approximation is used in conjunction with lookout and approximate policy evaluation using an m-step return. Specifically, we show that when linear function approximation is used to represent the value function, a certain minimum amount of lookahead and multi-step return is needed for the algorithm to even converge. And when this condition is met, we characterize the finite-time performance of policies obtained using such approximate policy iteration. Our results are presented for two different procedures to compute the function approximation: linear least-squares regression and gradient descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2013

Policy Iteration for Factored MDPs

Many large MDPs can be represented compactly using a dynamic Bayesian ne...
research
07/22/2020

Approximation Benefits of Policy Gradient Methods with Aggregated States

Folklore suggests that policy gradient can be more robust to misspecific...
research
10/16/2012

Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs

Recently, Petrik et al. demonstrated that L1Regularized Approximate Line...
research
04/16/2014

An Analysis of State-Relevance Weights and Sampling Distributions on L1-Regularized Approximate Linear Programming Approximation Accuracy

Recent interest in the use of L_1 regularization in the use of value fun...
research
07/10/2014

A New Optimal Stepsize For Approximate Dynamic Programming

Approximate dynamic programming (ADP) has proven itself in a wide range ...
research
06/11/2013

Stochastic approximation for speeding up LSTD (and LSPI)

We propose a stochastic approximation (SA) based method with randomizati...
research
09/19/2016

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

Recent progress in randomized motion planners has led to the development...

Please sign up or login with your details

Forgot password? Click here to reset