Cooperative Online Learning in Stochastic and Adversarial MDPs

01/31/2022
by   Tal Lancewicki, et al.
0

We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, m agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: fresh – where each agent's trajectory is sampled i.i.d, and non-fresh – where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. We thoroughly analyze all relevant settings, highlight the challenges and differences between the models, and prove nearly-matching regret lower and upper bounds. To our knowledge, we are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2021

Online Learning in Adversarial MDPs: Is the Communicating Case Harder than Ergodic?

We study online learning in adversarial communicating Markov Decision Pr...
research
01/31/2022

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

The standard assumption in reinforcement learning (RL) is that agents ob...
research
06/26/2014

Online learning in MDPs with side information

We study online learning of finite Markov decision process (MDP) problem...
research
02/25/2022

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Dynamic mechanism design studies how mechanism designers should allocate...
research
07/03/2020

Online learning in MDPs with linear function approximation and bandit feedback

We consider an online learning problem where the learner interacts with ...
research
06/15/2021

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Online reinforcement learning (RL) has been widely applied in informatio...
research
01/23/2019

Learning to Collaborate in Markov Decision Processes

We consider a two-agent MDP framework where agents repeatedly solve a ta...

Please sign up or login with your details

Forgot password? Click here to reset