Bandit learning in concave N-person games

10/03/2018
by   Mario Bravo, et al.
0

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2022

No-regret learning for repeated non-cooperative games with lossy bandits

This paper considers no-regret learning for repeated continuous-kernel g...
research
12/06/2021

Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

We consider online no-regret learning in unknown games with bandit feedb...
research
06/15/2019

Learning in Cournot Games with Limited Information Feedback

In this work, we study the interaction of strategic players in continuou...
research
06/27/2023

Semi Bandit Dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees

In this work, we introduce a new variant of online gradient descent, whi...
research
08/19/2022

Learning in Stackelberg Games with Non-myopic Agents

We study Stackelberg games where a principal repeatedly interacts with a...
research
06/09/2020

Stochastic matrix games with bandit feedback

We study a version of the classical zero-sum matrix game with unknown pa...
research
09/06/2022

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

We consider risk-averse learning in repeated unknown games where the goa...

Please sign up or login with your details

Forgot password? Click here to reset