Multi-Agent Learning for Iterative Dominance Elimination: Formal Barriers and New Algorithms

11/10/2021
by   Jibang Wu, et al.
0

Dominated actions are natural (and perhaps the simplest possible) multi-agent generalizations of sub-optimal actions as in standard single-agent decision making. Thus similar to standard bandit learning, a basic learning question in multi-agent systems is whether agents can learn to efficiently eliminate all dominated actions in an unknown game if they can only observe noisy bandit feedback about the payoff of their played actions. Surprisingly, despite a seemingly simple task, we show a quite negative result; that is, standard no regret algorithms – including the entire family of Dual Averaging algorithms – provably take exponentially many rounds to eliminate all dominated actions. Moreover, algorithms with the stronger no swap regret also suffer similar exponential inefficiency. To overcome these barriers, we develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards (termed Exp3-DH); Exp3-DH gradually forgets history at carefully tailored rates. We prove that when all agents run Exp3-DH (a.k.a., self-play in multi-agent learning), all dominated actions can be iteratively eliminated within polynomially many rounds. Our experimental results further demonstrate the efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those developed specifically for learning in games, fail to eliminate all dominated actions efficiently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2022

Distributed Bandits with Heterogeneous Agents

This paper tackles a multi-agent bandit setting where M agents cooperate...
research
02/28/2022

Robust Multi-Agent Bandits Over Undirected Graphs

We consider a multi-agent multi-armed bandit setting in which n honest a...
research
06/07/2023

Optimal Fair Multi-Agent Bandits

In this paper, we study the problem of fair multi-agent multi-arm bandit...
research
09/23/2022

An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret

Recently a multi-agent variant of the classical multi-armed bandit was p...
research
12/01/2022

Decision Market Based Learning For Multi-agent Contextual Bandit Problems

Information is often stored in a distributed and proprietary form, and a...
research
06/12/2023

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

We investigate learning the equilibria in non-stationary multi-agent sys...
research
09/06/2022

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

We consider risk-averse learning in repeated unknown games where the goa...

Please sign up or login with your details

Forgot password? Click here to reset