Distributed Bandits with Heterogeneous Agents

01/23/2022
by   Lin Yang, et al.
5

This paper tackles a multi-agent bandit setting where M agents cooperate together to solve the same instance of a K-armed stochastic bandit problem. The agents are heterogeneous: each agent has limited access to a local subset of arms and the agents are asynchronous with different gaps between decision-making rounds. The goal for each agent is to find its optimal local arm, and agents can cooperate by sharing their observations with others. While cooperation between agents improves the performance of learning, it comes with an additional complexity of communication between agents. For this heterogeneous multi-agent setting, we propose two learning algorithms, and . We prove that both algorithms achieve order-optimal regret, which is O(∑_i:Δ̃_i>0log T/Δ̃_i), where Δ̃_i is the minimum suboptimality gap between the reward mean of arm i and any local optimal arm. In addition, a careful selection of the valuable information for cooperation, achieves a low communication complexity of O(log T). Last, numerical experiments verify the efficiency of both algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2023

On-Demand Communication for Asynchronous Multi-Agent Bandits

This paper studies a cooperative multi-agent multi-armed stochastic band...
research
08/18/2022

Communication-Efficient Collaborative Best Arm Identification

We investigate top-m arm identification, a basic problem in bandit theor...
research
11/10/2021

Multi-Agent Learning for Iterative Dominance Elimination: Formal Barriers and New Algorithms

Dominated actions are natural (and perhaps the simplest possible) multi-...
research
02/09/2021

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

We explore the class of problems where a central planner needs to select...
research
06/26/2020

Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation

We study regret minimization problems in a two-sided matching market whe...
research
12/01/2022

Decision Market Based Learning For Multi-agent Contextual Bandit Problems

Information is often stored in a distributed and proprietary form, and a...
research
12/29/2021

Socially-Optimal Mechanism Design for Incentivized Online Learning

Multi-arm bandit (MAB) is a classic online learning framework that studi...

Please sign up or login with your details

Forgot password? Click here to reset