
Communication Efficient Parallel Reinforcement Learning
We consider the problem where M agents interact with M identical and ind...
read it

Distributed Beamforming for Agents with Localization Errors
We consider a scenario in which a group of agents aim to collectively tr...
read it

MultiAgent MultiArmed Bandits with Limited Communication
We consider the problem where N agents collaboratively interact with an ...
read it

Quantifying the Burden of Exploration and the Unfairness of Free Riding
We consider the multiarmed bandit setting with a twist. Rather than hav...
read it

A Greedy Algorithm for the Social Golfer and the Oberwolfach Problem
Inspired by the increasing popularity of Swisssystem tournaments in spo...
read it

Structured Stochastic Linear Bandits
The stochastic linear bandit problem proceeds in rounds where at each ro...
read it

A PAC algorithm in relative precision for bandit problem with costly sampling
This paper considers the problem of maximizing an expectation function o...
read it
A MultiArm Bandit Approach To Subset Selection Under Constraints
We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents' quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a MultiArm Bandit (MAB) problem and propose to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ, outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(ln T), where T is the total number of rounds. We further illustrate the efficacy of through simulations. To overcome the computational limitations of , we propose a polynomialtime greedy algorithm, namely , that provides an approximate solution to our ILP. We also compare the performance of and through experiments.
READ FULL TEXT
Comments
There are no comments yet.