Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

05/04/2020
by   Dimitri Bertsekas, et al.
21

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order, with knowledge of the choices of the preceding agents in the order. As a result, the amount of computation for each policy improvement grows linearly with the number of agents, as opposed to exponentially for the standard all-agents-at-once method. For the case of a finite-state discounted problem, we showed convergence to an agent-by-agent optimal policy. In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2019

Multiagent Rollout Algorithms and Reinforcement Learning

We consider finite and infinite horizon dynamic programming problems, wh...
research
06/01/2021

On-Line Policy Iteration for Infinite Horizon Dynamic Programming

In this paper we propose an on-line policy iteration (PI) algorithm for ...
research
12/19/2013

The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming

This note provides a simple example demonstrating that, if exact computa...
research
09/19/2016

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

Recent progress in randomized motion planners has led to the development...
research
07/22/2021

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

We introduce a contractive abstract dynamic programming framework and re...
research
10/27/2021

A Subgame Perfect Equilibrium Reinforcement Learning Approach to Time-inconsistent Problems

In this paper, we establish a subgame perfect equilibrium reinforcement ...
research
11/09/2020

Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems

In this paper we consider infinite horizon discounted dynamic programmin...

Please sign up or login with your details

Forgot password? Click here to reset