Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

09/01/2020
by   Qifan Zhang, et al.
0

We explore the use of policy approximation for reducing the computational cost of learning Nash equilibria in multi-agent reinforcement learning scenarios. We propose a new algorithm for zero-sum stochastic games in which each agent simultaneously learns a Nash policy and an entropy-regularized policy. The two policies help each other towards convergence: the former guides the latter to the desired Nash equilibrium, while the latter serves as an efficient approximation of the former. We demonstrate the possibility of using the proposed algorithm to transfer previous training experiences to different environments, enabling the agents to adapt quickly to new tasks. We also provide a dynamic hyper-parameter scheduling scheme for further expedited convergence. Empirical results applied to a number of stochastic games show that the proposed algorithm converges to the Nash equilibrium while exhibiting a major speed-up over existing algorithms.

READ FULL TEXT
research
05/23/2023

Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria

The works of (Daskalakis et al., 2009, 2022; Jin et al., 2022; Deng et a...
research
10/12/2022

Zero-Knowledge Optimal Monetary Policy under Stochastic Dominance

Optimal simple rules for the monetary policy of the first stochastically...
research
10/14/2022

Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games

We study Nash equilibria learning of a general-sum stochastic game with ...
research
09/13/2020

Efficient Competitive Self-Play Policy Optimization

Reinforcement learning from self-play has recently reported many success...
research
11/22/2012

A hybrid cross entropy algorithm for solving dynamic transit network design problem

This paper proposes a hybrid multiagent learning algorithm for solving t...
research
06/07/2018

Re-evaluating evaluation

Progress in machine learning is measured by careful evaluation on proble...
research
04/23/2019

Deep Q-Learning for Nash Equilibria: Nash-DQN

Model-free learning for multi-agent stochastic games is an active area o...

Please sign up or login with your details

Forgot password? Click here to reset