DeepAI
Log In Sign Up

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

02/09/2021
by   Ayush Deva, et al.
0

We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents' quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a Multi-Arm Bandit (MAB) problem and propose to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ, outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(ln T), where T is the total number of rounds. We further illustrate the efficacy of through simulations. To overcome the computational limitations of , we propose a polynomial-time greedy algorithm, namely , that provides an approximate solution to our ILP. We also compare the performance of and through experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/22/2021

Communication Efficient Parallel Reinforcement Learning

We consider the problem where M agents interact with M identical and ind...
03/27/2020

Distributed Beamforming for Agents with Localization Errors

We consider a scenario in which a group of agents aim to collectively tr...
01/23/2022

Distributed Bandits with Heterogeneous Agents

This paper tackles a multi-agent bandit setting where M agents cooperate...
02/10/2021

Multi-Agent Multi-Armed Bandits with Limited Communication

We consider the problem where N agents collaboratively interact with an ...
10/20/2018

Quantifying the Burden of Exploration and the Unfairness of Free Riding

We consider the multi-armed bandit setting with a twist. Rather than hav...
07/16/2022

Collaborative Best Arm Identification with Limited Communication on Non-IID Data

In this paper, we study the tradeoffs between time-speedup and the numbe...
07/21/2020

A Greedy Algorithm for the Social Golfer and the Oberwolfach Problem

Inspired by the increasing popularity of Swiss-system tournaments in spo...