Value Function Approximation in Zero-Sum Markov Games

12/12/2012
by   Michail Lagoudakis, et al.
0

This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 8

page 9

page 10

research
02/17/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

We develop provably efficient reinforcement learning algorithms for two-...
research
12/29/2022

Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

Function approximation (FA) has been a critical component in solving lar...
research
12/03/2022

Smoothing Policy Iteration for Zero-sum Markov Games

Zero-sum Markov Games (MGs) has been an efficient framework for multi-ag...
research
05/13/2014

Rate of Convergence and Error Bounds for LSTD(λ)

We consider LSTD(λ), the least-squares temporal-difference algorithm wit...
research
05/27/2019

Temporal-difference learning for nonlinear value function approximation in the lazy training regime

We discuss the approximation of the value function for infinite-horizon ...
research
03/18/2022

Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

In this paper, we consider the infinite-horizon reach-avoid zero-sum gam...
research
01/22/2013

Properties of the Least Squares Temporal Difference learning algorithm

This paper presents four different ways of looking at the well-known Lea...

Please sign up or login with your details

Forgot password? Click here to reset