Log In Sign Up

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

by   Stefanos Leonardos, et al.

The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.


page 1

page 2

page 3

page 4


Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-exploitation is a powerful and practical tool in multi-agent...

Asynchronous Gradient Play in Zero-Sum Multi-agent Games

Finding equilibria via gradient play in competitive multi-agent games ha...

Stable Opponent Shaping in Differentiable Games

A growing number of learning methods are actually games which optimise m...

Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Optimizing strategic decisions (a.k.a. computing equilibrium) is key to ...

Catastrophe by Design in Population Games: Destabilizing Wasteful Locked-in Technologies

In multi-agent environments in which coordination is desirable, the hist...

The Dynamics of Q-learning in Population Games: a Physics-Inspired Continuity Equation Model

Although learning has found wide application in multi-agent systems, its...

Cycles in adversarial regularized learning

Regularized learning is a fundamental technique in online optimization, ...