Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

03/12/2023
by   Kaan Gokcesu, et al.
0

We study the adversarial online learning problem and create a completely online algorithmic framework that has data dependent regret guarantees in both full expert feedback and bandit feedback settings. We study the expected performance of our algorithm against general comparators, which makes it applicable for a wide variety of problem scenarios. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary comparator sequences, which is the difference between our losses and a competing loss sequence. The competition class can be designed to include fixed arm selections, switching bandits, contextual bandits, periodic bandits or any other competition of interest. The sequences in the competition class are generally determined by the specific application at hand and should be designed accordingly. Our algorithm neither uses nor needs any preliminary information about the loss sequences and is completely online. Its performance bounds are data dependent, where any affine transform of the losses has no effect on the normalized regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2021

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

We study the adversarial multi-armed bandit problem and create a complet...
research
04/13/2022

Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback

We study the problem of expert advice under partial bandit feedback sett...
research
09/09/2020

A Generalized Online Algorithm for Translation and Scale Invariant Prediction with Expert Advice

In this work, we aim to create a completely online algorithmic framework...
research
11/09/2017

Small-loss bounds for online learning with partial information

We consider the problem of adversarial (non-stochastic) online learning ...
research
02/28/2022

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

We consider learning a stochastic bandit model, where the reward functio...
research
06/07/2021

On Learning to Rank Long Sequences with Contextual Bandits

Motivated by problems of learning to rank long item sequences, we introd...
research
07/09/2008

Algorithm Selection as a Bandit Problem with Unbounded Losses

Algorithm selection is typically based on models of algorithm performanc...

Please sign up or login with your details

Forgot password? Click here to reset