Communication Efficient Parallel Reinforcement Learning

02/22/2021
by   Mridul Agarwal, et al.
0

We consider the problem where M agents interact with M identical and independent environments with S states and A actions using reinforcement learning for T rounds. The agents share their data with a central server to minimize their regret. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds. We provide which runs at each agent and prove that the total cumulative regret of M agents is upper bounded as O(DS√(MAT)) for a Markov Decision Process with diameter D, number of states S, and number of actions A. The agents synchronize after their visitations to any state-action pair exceeds a certain threshold. Using this, we obtain a bound of O(MSAlog(MT)) on the total number of communications rounds. Finally, we evaluate the algorithm against multiple environments and demonstrate that the proposed algorithm performs at par with an always communication version of the UCRL2 algorithm, while with significantly lower communication.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Multi-Agent Multi-Armed Bandits with Limited Communication

We consider the problem where N agents collaboratively interact with an ...
research
02/09/2021

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

We explore the class of problems where a central planner needs to select...
research
05/10/2023

Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation

We study multi-agent reinforcement learning in the setting of episodic M...
research
03/06/2021

Reinforcement Learning, Bit by Bit

Reinforcement learning agents have demonstrated remarkable achievements ...
research
11/14/2018

Incentivizing Exploration with Unbiased Histories

In a social learning setting, there is a set of actions, each of which h...
research
11/14/2022

Parallel Automatic History Matching Algorithm Using Reinforcement Learning

Reformulating the history matching problem from a least-square mathemati...
research
08/25/2017

Reinforcement Mechanism Design for e-commerce

We study the problem of allocating impressions to sellers in e-commerce ...

Please sign up or login with your details

Forgot password? Click here to reset