A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

07/28/2017
by   Pablo Hernandez-Leal, et al.
0

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

READ FULL TEXT
research
09/16/2020

Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning

Guilt aversion induces experience of a utility loss in people if they be...
research
12/07/2022

Stochastic Rising Bandits

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e...
research
07/25/2023

A behavioural transformer for effective collaboration between a robot and a non-stationary human

A key challenge in human-robot collaboration is the non-stationarity cre...
research
01/22/2021

Theory of Mind for Deep Reinforcement Learning in Hanabi

The partially observable card game Hanabi has recently been proposed as ...
research
08/04/2022

Learning the Trading Algorithm in Simulated Markets with Non-stationary Continuum Bandits

The basic Multi-Armed Bandits (MABs) problem is trying to maximize the r...
research
06/22/2020

An Online Algorithm for Computation Offloading in Non-Stationary Environments

We consider the latency minimization problem in a task-offloading scenar...
research
08/16/2021

Do Proportionate Algorithms Exploit Sparsity?

Adaptive filters exploiting sparsity have been a very active research fi...

Please sign up or login with your details

Forgot password? Click here to reset