Cooperative Online Learning

06/09/2021
by   Tommaso R. Cesari, et al.
0

In this preliminary (and unpolished) version of the paper, we study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. Some feedback is then revealed to these agents and is later propagated through the network. We consider the case of full, bandit, and semi-bandit feedback. In particular, we construct a reduction to delayed single-agent learning that applies to both the full and the bandit feedback case and allows to obtain regret guarantees for both settings. We complement these results with a near-matching lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/23/2019

Cooperative Online Learning: Keeping your Neighbors Updated

We study an asynchronous online learning setting with a network of agent...
01/31/2022

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

The standard assumption in reinforcement learning (RL) is that agents ob...
10/05/2020

An Efficient Algorithm for Cooperative Semi-Bandits

We consider the problem of asynchronous online combinatorial optimizatio...
05/17/2021

Multiclass Classification using dilute bandit feedback

This paper introduces a new online learning framework for multiclass cla...
12/21/2020

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

Online learning has been successfully applied to many problems in which ...
02/08/2019

Bandit Principal Component Analysis

We consider a partial-feedback variant of the well-studied online PCA pr...
12/02/2020

Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit

Motivated by real-world applications such as fast fashion retailing and ...