Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

06/28/2018
by   Dylan J. Foster, et al.
0

We introduce a new family of margin-based regret guarantees for adversarial contextual bandit learning. Our results are based on multiclass surrogate losses. Using the ramp loss, we derive a universal margin-based regret bound in terms of the sequential metric entropy for a benchmark class of real-valued regression functions. The new margin bound serves as a complete contextual bandit analogue of the classical margin bound from statistical learning. The result applies to large nonparametric classes, improving on the best known results for Lipschitz contextual bandits (Cesa-Bianchi et al., 2017) and, as a special case, generalizes the dimension-independent Banditron regret bound (Kakade et al., 2008) to arbitrary linear classes with smooth norms. On the algorithmic side, we use the hinge loss to derive an efficient algorithm with a √(dT)-type mistake bound against benchmark policies induced by d-dimensional regression functions. This provides the first hinge loss-based solution to the open problem of Abernethy and Rakhlin (2009). With an additional i.i.d. assumption we give a simple oracle-efficient algorithm whose regret matches our generic metric entropy-based bound for sufficiently complex nonparametric classes. Under realizability assumptions our results also yield classical regret bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2021

Adapting to misspecification in contextual bandits with offline regression oracles

Computationally efficient contextual bandits are often based on estimati...
research
03/05/2020

Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits

We propose the Generalized Policy Elimination (GPE) algorithm, an oracle...
research
02/27/2017

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

We investigate contextual online learning with nonparametric (Lipschitz)...
research
11/17/2021

Fast Rates for Nonparametric Online Learning: From Realizability to Learning in Games

We study fast rates of convergence in the setting of nonparametric onlin...
research
10/11/2018

Fighting Contextual Bandits with Stochastic Smoothing

We introduce a new stochastic smoothing perspective to study adversarial...
research
05/27/2022

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

We study the Bayesian regret of the renowned Thompson Sampling algorithm...
research
07/16/2020

Comparator-adaptive Convex Bandits

We study bandit convex optimization methods that adapt to the norm of th...

Please sign up or login with your details

Forgot password? Click here to reset