A Multi-Arm Bandit Approach To Subset Selection Under Constraints

02/09/2021
by   Ayush Deva, et al.
0

We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents' quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a Multi-Arm Bandit (MAB) problem and propose to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ, outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(ln T), where T is the total number of rounds. We further illustrate the efficacy of through simulations. To overcome the computational limitations of , we propose a polynomial-time greedy algorithm, namely , that provides an approximate solution to our ILP. We also compare the performance of and through experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Communication Efficient Parallel Reinforcement Learning

We consider the problem where M agents interact with M identical and ind...
research
03/27/2020

Distributed Beamforming for Agents with Localization Errors

We consider a scenario in which a group of agents aim to collectively tr...
research
01/23/2022

Distributed Bandits with Heterogeneous Agents

This paper tackles a multi-agent bandit setting where M agents cooperate...
research
05/26/2022

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

We study distributed contextual linear bandits with stochastic contexts,...
research
07/27/2023

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

This paper presents an efficient algorithm to solve the sleeping bandit ...
research
07/16/2022

Collaborative Best Arm Identification with Limited Communication on Non-IID Data

In this paper, we study the tradeoffs between time-speedup and the numbe...
research
07/21/2020

A Greedy Algorithm for the Social Golfer and the Oberwolfach Problem

Inspired by the increasing popularity of Swiss-system tournaments in spo...

Please sign up or login with your details

Forgot password? Click here to reset