Small-loss bounds for online learning with partial information

11/09/2017
by   Thodoris Lykouris, et al.
0

We consider the problem of adversarial (non-stochastic) online learning with partial information feedback, where at each stage, a decision maker picks an action from a finite set of possible actions. We develop a black-box approach to solving such problems where the learner observes as feedback only losses of a subset of the actions that include the selected action. Specifically, when losses of actions are non-negative, under the graph-based feedback model introduced by Mannor and Shamir, we offer algorithms that attain the so called "small-loss" regret bounds with high probability. Prior to our work, there was no data-dependent guarantee for general feedback graphs even for pseudo-regret (without dependence on the number of actions, i.e., taking advantage of the increased information feedback). Addressing this, we provide a high probability small-loss guarantee. Taking advantage of the black-box nature of our technique, we show applications to getting high probability small loss guarantees for semi-bandits (including routing in networks) and contextual bandits, including possibly infinite comparator class (such as infinite possible strategies in contextual bandits) as well as learning with slowly changing (shifting) comparators. In the special case of classical bandit and semi-bandit problems, we provide optimal small-loss, high-probability guarantees of O(√(dL^)) for the actual regret, where d is the number of arms and L^ is the loss of the best arm or action, answering open questions of Neu. Previous work for bandits and semi-bandits offered analogous regret guarantee only for pseudo-regret and only in expectation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2015

First-order regret bounds for combinatorial semi-bandits

We consider the problem of online combinatorial optimization under semi-...
research
09/30/2014

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

We present and study a partial-information model of online learning, whe...
research
06/14/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds fo...
research
06/01/2023

Last Switch Dependent Bandits with Monotone Payoff Functions

In a recent work, Laforgue et al. introduce the model of last switch dep...
research
03/12/2023

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

We study the adversarial online learning problem and create a completely...
research
06/07/2021

Beyond Bandit Feedback in Online Multiclass Classification

We study the problem of online multiclass classification in a setting wh...
research
02/16/2023

Infinite Action Contextual Bandits with Reusable Data Exhaust

For infinite action contextual bandits, smoothed regret and reduction to...

Please sign up or login with your details

Forgot password? Click here to reset