Log In Sign Up

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

by   Ayush Deva, et al.

We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents' quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a Multi-Arm Bandit (MAB) problem and propose to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ, outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(ln T), where T is the total number of rounds. We further illustrate the efficacy of through simulations. To overcome the computational limitations of , we propose a polynomial-time greedy algorithm, namely , that provides an approximate solution to our ILP. We also compare the performance of and through experiments.


page 1

page 2

page 3

page 4


Communication Efficient Parallel Reinforcement Learning

We consider the problem where M agents interact with M identical and ind...

Distributed Beamforming for Agents with Localization Errors

We consider a scenario in which a group of agents aim to collectively tr...

Distributed Bandits with Heterogeneous Agents

This paper tackles a multi-agent bandit setting where M agents cooperate...

Multi-Agent Multi-Armed Bandits with Limited Communication

We consider the problem where N agents collaboratively interact with an ...

Quantifying the Burden of Exploration and the Unfairness of Free Riding

We consider the multi-armed bandit setting with a twist. Rather than hav...

Collaborative Best Arm Identification with Limited Communication on Non-IID Data

In this paper, we study the tradeoffs between time-speedup and the numbe...

A Greedy Algorithm for the Social Golfer and the Oberwolfach Problem

Inspired by the increasing popularity of Swiss-system tournaments in spo...