Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs

07/20/2021
by   Liad Erez, et al.
0

We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph G over the available actions. We develop an algorithm that simultaneously achieves regret bounds of the form: 𝒪(√(θ(G) T)) with adversarial losses; 𝒪(θ(G)polylogT) with stochastic losses; and 𝒪(θ(G)polylogT + √(θ(G) C)) with stochastic losses subject to C adversarial corruptions. Here, θ(G) is the clique covering number of the graph G. Our algorithm is an instantiation of Follow-the-Regularized-Leader with a novel regularization that can be seen as a product of a Tsallis entropy component (inspired by Zimmert and Seldin (2019)) and a Shannon entropy component (analyzed in the corrupted stochastic case by Amir et al. (2020)), thus subtly interpolating between the two forms of entropies. One of our key technical contributions is in establishing the convexity of this regularizer and controlling its inverse Hessian, despite its complex product structure.

READ FULL TEXT
research
06/10/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

This work studies the problem of learning episodic Markov Decision Proce...
research
05/23/2016

Online Learning with Feedback Graphs Without the Graphs

We study an online learning framework introduced by Mannor and Shamir (2...
research
05/06/2023

An improved regret analysis for UCB-N and TS-N

In the setting of stochastic online learning with undirected feedback gr...
research
02/15/2022

Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness

Stochastic and adversarial data are two widely studied settings in onlin...
research
02/24/2020

Prediction with Corrupted Expert Advice

We revisit the fundamental problem of prediction with expert advice, in ...
research
09/25/2018

Fully Implicit Online Learning

Regularized online learning is widely used in machine learning. In this ...
research
06/08/2021

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

We consider the best-of-both-worlds problem for learning an episodic Mar...

Please sign up or login with your details

Forgot password? Click here to reset