Solution of Two-Player Zero-Sum Game by Successive Relaxation

We consider the problem of two-player zero-sum game. In this setting, there are two agents working against each other. Both the agents observe the same state and the objective of the agents is to compute a strategy profile that maximizes their rewards. However, the reward of the second agent is negative of reward obtained by the first agent. Therefore, the objective of the second agent is to minimize the total reward obtained by the first agent. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the max-min reward (of first player), starting from a given state is called the equilibrium value of the state. In this work, we compute the solution of the two-player zero-sum game utilizing the technique of successive relaxation. Successive relaxation has been successfully applied in the literature to compute a faster value iteration algorithm in the context of Markov Decision Processes. We extend the concept of successive relaxation to the two-player zero-sum games. We prove that, under a special structure, this technique computes the optimal solution faster than the techniques in the literature. We then derive a generalized minimax Q-learning algorithm that computes the optimal policy when the model information is not known. Finally, we prove the convergence of the proposed generalized minimax Q-learning algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2019

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP) the objective is to...
research
07/25/2018

A Minimax Tree Based Approach for Minimizing Detectability and Maximizing Visibility

We introduce and study the problem of planning a trajectory for an agent...
research
06/09/2023

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

The objective of this paper is to investigate the finite-time analysis o...
research
05/31/2023

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

We examine online safe multi-agent reinforcement learning using constrai...
research
12/01/2017

A Short Solution to the Many-Player Silent Duel with Arbitrary Consolation Prize

The classical constant-sum 'silent duel' game had two antagonistic marks...
research
05/31/2022

One Policy is Enough: Parallel Exploration with a Single Policy is Minimax Optimal for Reward-Free Reinforcement Learning

While parallelism has been extensively used in Reinforcement Learning (R...
research
03/11/2019

Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games

With the recent advances in solving large, zero-sum extensive form games...

Please sign up or login with your details

Forgot password? Click here to reset