On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems

07/15/2019
by   Stefano V. Albrecht, et al.
0

While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior which can learn correlated distributions. Secondly, since the types are provided by an expert, they may be inaccurate in the sense that they do not predict the agents' observed actions. We provide a novel characterisation of optimality which allows experts to use efficient model checking algorithms to verify optimality of types.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2019

An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types

Many multiagent applications require an agent to learn quickly how to in...
research
07/28/2015

Belief and Truth in Hypothesised Behaviours

There is a long history in game theory on the topic of Bayesian or "rati...
research
06/26/2019

Reasoning about Hypothetical Agent Behaviours and their Parameters

Agents can achieve effective interaction with previously unknown other a...
research
02/04/2014

Learning by Observation of Agent Software Images

Learning by observation can be of key importance whenever agents sharing...
research
07/23/2019

E-HBA: Using Action Policies for Expert Advice and Agent Typification

Past research has studied two approaches to utilise predefined policy se...
research
05/09/2023

Latent Interactive A2C for Improved RL in Open Many-Agent Systems

There is a prevalence of multiagent reinforcement learning (MARL) method...
research
06/16/2020

Local Information Opponent Modelling Using Variational Autoencoders

Modelling the behaviours of other agents (opponents) is essential for un...

Please sign up or login with your details

Forgot password? Click here to reset