Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback

04/13/2022
by   Kaan Gokcesu, et al.
0

We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2023

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

We study the adversarial online learning problem and create a completely...
research
02/10/2014

A Second-order Bound with Excess Losses

We study online aggregation of the predictions of experts, and first sho...
research
12/17/2020

Experts with Lower-Bounded Loss Feedback: A Unifying Framework

The most prominent feedback models for the best expert problem are the f...
research
02/27/2019

Adaptive Hedging under Delayed Feedback

The article is devoted to investigating the application of hedging strat...
research
07/09/2008

Algorithm Selection as a Bandit Problem with Unbounded Losses

Algorithm selection is typically based on models of algorithm performanc...
research
08/07/2022

Optimal Tracking in Prediction with Expert Advice

We study the prediction with expert advice setting, where the aim is to ...
research
10/19/2019

On Adaptivity in Information-constrained Online Learning

We study how to adapt to smoothly-varying (`easy') environments in well-...

Please sign up or login with your details

Forgot password? Click here to reset