Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

12/05/2020
by   Stefanos Leonardos, et al.
0

Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.

READ FULL TEXT

page 10

page 24

page 27

research
06/24/2021

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

The interplay between exploration and exploitation in competitive multi-...
research
07/25/2020

Catastrophe by Design in Population Games: Destabilizing Wasteful Locked-in Technologies

In multi-agent environments in which coordination is desirable, the hist...
research
09/27/2018

Equilibria in Quantitative Concurrent Games

Synthesis of finite-state controllers from high-level specifications in ...
research
09/06/2016

Q-Learning with Basic Emotions

Q-learning is a simple and powerful tool in solving dynamic problems whe...
research
07/21/2011

Centric selection: a way to tune the exploration/exploitation trade-off

In this paper, we study the exploration / exploitation trade-off in cell...
research
03/03/2022

The Dynamics of Q-learning in Population Games: a Physics-Inspired Continuity Equation Model

Although learning has found wide application in multi-agent systems, its...
research
07/27/2022

Adapting the Exploration-Exploitation Balance in Heterogeneous Swarms: Tracking Evasive Targets

There has been growing interest in the use of multi-robot systems in var...

Please sign up or login with your details

Forgot password? Click here to reset