DeepAI AI Chat
Log In Sign Up

Planning and Learning with Adaptive Lookahead

by   Aviv Rosenberg, et al.

The classical Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation. Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration. However, prior to running the algorithm, one cannot tell what is the best fixed lookahead horizon. Moreover, per a given run, using a lookahead of horizon larger than one is often wasteful. In this work, we propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate. We devise two PI variants and analyze the trade-off between iteration count and computational complexity per iteration. The first variant takes the desired contraction factor as the objective and minimizes the per-iteration complexity. The second variant takes as input the computational complexity per iteration and minimizes the overall contraction factor. We then devise a corresponding DQN-based algorithm with an adaptive tree search horizon. We also include a novel enhancement for on-policy learning: per-depth value function estimator. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and in Atari.


On the Complexity of Value Iteration

Value iteration is a fundamental algorithm for solving Markov Decision P...

Convergence Analysis of Policy Iteration

Adaptive optimal control of nonlinear dynamic systems with deterministic...

Second Order Value Iteration in Reinforcement Learning

Value iteration is a fixed point iteration technique utilized to obtain ...

Heuristic Search Value Iteration for POMDPs

We present a novel POMDP planning algorithm called heuristic search valu...

On the convergence rate of the Kačanov scheme for shear-thinning fluids

We explore the convergence rate of the Kačanov iteration scheme for diff...

A Horizon Detection Algorithm for Maritime Surveillance

The horizon line is a valuable feature in the maritime environment as it...

Value Iteration Networks with Gated Summarization Module

In this paper, we address the challenges faced by Value Iteration Networ...