DeepAI AI Chat
Log In Sign Up

Planning and Learning with Adaptive Lookahead

01/28/2022
by   Aviv Rosenberg, et al.
0

The classical Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation. Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration. However, prior to running the algorithm, one cannot tell what is the best fixed lookahead horizon. Moreover, per a given run, using a lookahead of horizon larger than one is often wasteful. In this work, we propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate. We devise two PI variants and analyze the trade-off between iteration count and computational complexity per iteration. The first variant takes the desired contraction factor as the objective and minimizes the per-iteration complexity. The second variant takes as input the computational complexity per iteration and minimizes the overall contraction factor. We then devise a corresponding DQN-based algorithm with an adaptive tree search horizon. We also include a novel enhancement for on-policy learning: per-depth value function estimator. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and in Atari.

READ FULL TEXT
07/13/2018

On the Complexity of Value Iteration

Value iteration is a fundamental algorithm for solving Markov Decision P...
05/20/2015

Convergence Analysis of Policy Iteration

Adaptive optimal control of nonlinear dynamic systems with deterministic...
05/10/2019

Second Order Value Iteration in Reinforcement Learning

Value iteration is a fixed point iteration technique utilized to obtain ...
07/11/2012

Heuristic Search Value Iteration for POMDPs

We present a novel POMDP planning algorithm called heuristic search valu...
01/05/2021

On the convergence rate of the Kačanov scheme for shear-thinning fluids

We explore the convergence rate of the Kačanov iteration scheme for diff...
10/26/2021

A Horizon Detection Algorithm for Maritime Surveillance

The horizon line is a valuable feature in the maritime environment as it...
05/11/2023

Value Iteration Networks with Gated Summarization Module

In this paper, we address the challenges faced by Value Iteration Networ...