Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

05/03/2021
by   Xiong Wang, et al.
0

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Fair Algorithms for Multi-Agent Multi-Armed Bandits

We propose a multi-agent variant of the classical multi-armed bandit pro...
research
03/24/2020

Q-Learning in Regularized Mean-field Games

In this paper, we introduce a regularized mean-field game and study lear...
research
02/22/2018

Collaboratively Learning the Best Option, Using Bounded Memory

We consider multi-armed bandit problems in social groups wherein each in...
research
05/30/2019

Reinforcement Learning for Mean Field Game

Stochastic games provide a framework for interactions among multi-agents...
research
03/06/2019

Mean Field Equilibrium: Uniqueness, Existence, and Comparative Statics

The standard solution concept for stochastic games is Markov perfect equ...
research
11/08/2018

Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

We consider multi-armed bandit problems in social groups wherein each in...
research
11/20/2018

Playing with and against Hedge

Hedge has been proposed as an adaptive scheme, which guides an agent's d...

Please sign up or login with your details

Forgot password? Click here to reset