Smoothing Policy Iteration for Zero-sum Markov Games

12/03/2022
by   Yangang Ren, et al.
0

Zero-sum Markov Games (MGs) has been an efficient framework for multi-agent systems and robust control, wherein a minimax problem is constructed to solve the equilibrium policies. At present, this formulation is well studied under tabular settings wherein the maximum operator is primarily and exactly solved to calculate the worst-case value function. However, it is non-trivial to extend such methods to handle complex tasks, as finding the maximum over large-scale action spaces is usually cumbersome. In this paper, we propose the smoothing policy iteration (SPI) algorithm to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies. Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.We also prove the convergence of SPI and analyze its approximation error in ∞ -norm based on the contraction mapping theorem. Besides, we propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations. The target value related to WLSE function is evaluated by the sampled trajectories and then mean square error is constructed to optimize the value function, and the gradient-ascent-descent methods are adopted to optimize the protagonist and adversarial policies jointly. In addition, we incorporate the reparameterization technique in model-based gradient back-propagation to prevent the gradient vanishing due to sampling from the stochastic policies. We verify our algorithm in both tabular and function approximation settings. Results show that SPI can approximate the worst-case value function with a high accuracy and SaAC can stabilize the training process and improve the adversarial robustness in a large margin.

READ FULL TEXT
research
12/12/2012

Value Function Approximation in Zero-Sum Markov Games

This paper investigates value function approximation in the context of z...
research
04/05/2021

NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning

Multi-agent value-based approaches recently make great progress, especia...
research
10/05/2021

Robustness and sample complexity of model-based MARL for general-sum Markov games

Multi-agent reinfocement learning (MARL) is often modeled using the fram...
research
03/06/2021

Zero-Sum Semi-Markov Games with State-Action-Dependent Discount Factors

Semi-Markov model is one of the most general models for stochastic dynam...
research
05/02/2018

Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies

In reinforcement learning, temporal difference (TD) is the most direct a...
research
12/29/2022

Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

Function approximation (FA) has been a critical component in solving lar...
research
04/10/2019

Solving Dynamic Discrete Choice Models Using Smoothing and Sieve Methods

We propose to combine smoothing, simulations and sieve approximations to...

Please sign up or login with your details

Forgot password? Click here to reset