Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

07/04/2022
by   Tian Tian, et al.
0

Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many applications. Asynchronous VI helps to address the large state space problem by updating one state at a time, in-place and in an arbitrary order. However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space. To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions. More concretely, DAVI maximizes over a sampled subset of actions that can be of any user-defined size. This simple approach of using sampling to reduce computation maintains similarly appealing theoretical properties to VI without the need to wait for a full sweep through the entire action space in each update. In this paper, we show DAVI converges to the optimal value function with probability one, converges at a near-geometric rate with probability 1-delta, and returns a near-optimal policy in computation time that nearly matches a previously established bound for VI. We also empirically demonstrate DAVI's effectiveness in several experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2018

AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Reinforcement Learning with Near-Optimal Sample Complexity

In this paper, we propose AsyncQVI: Asynchronous-Parallel Q-value Iterat...
research
06/12/2022

Geometric Policy Iteration for Markov Decision Processes

Recently discovered polyhedral structures of the value function for fini...
research
09/10/2019

Multi-Step Greedy and Approximate Real Time Dynamic Programming

Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming...
research
06/30/2011

Restricted Value Iteration: Theory and Algorithms

Value iteration is a popular algorithm for finding near optimal policies...
research
05/10/2021

Value Iteration in Continuous Actions, States and Time

Classical value iteration approaches are not applicable to environments ...
research
03/14/2022

The Efficacy of Pessimism in Asynchronous Q-Learning

This paper is concerned with the asynchronous form of Q-learning, which ...
research
02/17/2020

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

The choice of the control frequency of a system has a relevant impact on...

Please sign up or login with your details

Forgot password? Click here to reset