Accelerating Value Iteration with Anchoring

05/26/2023
by   Jongmin Lee, et al.
0

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a 𝒪(γ^k)-rate, where γ is the discount factor. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an anchoring mechanism (distinct from Nesterov's acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a 𝒪(1/k)-rate for γ≈ 1 or even γ=1, while standard VI has rate 𝒪(1) for γ≥ 1-1/k, where k is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of 4, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gauss–Seidel VI setups as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2023

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

The classical algorithms used in tabular reinforcement learning (Value I...
research
02/23/2022

Sub-optimality of Gauss–Hermite quadrature and optimality of trapezoidal rule for functions with finite smoothness

A sub-optimality of Gauss–Hermite quadrature and an optimality of the tr...
research
10/02/2019

Optimistic Value Iteration

Markov decision processes are widely used for planning and verification ...
research
07/10/2017

Accelerated Stochastic Power Iteration

Principal component analysis (PCA) is one of the most powerful tools in ...
research
09/25/2018

Anderson Acceleration for Reinforcement Learning

Anderson acceleration is an old and simple method for accelerating the c...
research
02/04/2014

Generalization and Exploration via Randomized Value Functions

We propose randomized least-squares value iteration (RLSVI) -- a new rei...
research
12/09/2020

Enhancing Parameter-Free Frank Wolfe with an Extra Subproblem

Aiming at convex optimization under structural constraints, this work in...

Please sign up or login with your details

Forgot password? Click here to reset