Nonstochastic Bandits with Composite Anonymous Feedback

12/06/2021
by   Nicolò Cesa-Bianchi, et al.
4

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order √((d+1)KT), where d is the maximum delay, K is the number of arms, and T is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2022

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback

We study the adversarial bandit problem with composite anonymous delayed...
research
03/23/2023

Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

This paper investigates the problem of combinatorial multiarmed bandits ...
research
05/28/2019

Combinatorial Bandits with Full-Bandit Feedback: Sample Complexity and Regret Minimization

Combinatorial Bandits generalize multi-armed bandits, where k out of n a...
research
07/04/2018

Factored Bandits

We introduce the factored bandits model, which is a framework for learni...
research
08/30/2013

Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback

Given a set V of n objects, an online ranking system outputs at each tim...
research
02/24/2020

Fair Bandit Learning with Delayed Impact of Actions

Algorithmic fairness has been studied mostly in a static setting where t...
research
03/23/2021

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmer...

Please sign up or login with your details

Forgot password? Click here to reset