Provably Optimal Algorithms for Generalized Linear Contextual Bandits

02/28/2017
by   Lihong Li, et al.
0

Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search. Generalized linear models (logistical regression in particular) have demonstrated stronger performance than linear models in many applications where rewards are binary. However, most theoretical analyses on contextual bandits so far are on linear bandits. In this work, we propose an upper confidence bound based algorithm for generalized linear contextual bandits, which achieves an Õ(√(dT)) regret over T rounds with d dimensional feature vectors. This regret matches the minimax lower bound, up to logarithmic terms, and improves on the best previous result by a √(d) factor, assuming the number of arms is fixed. A key component in our analysis is to establish a new, sharp finite-sample confidence bound for maximum-likelihood estimates in generalized linear models, which may be of independent interest. We also analyze a simpler upper confidence bound algorithm, which is useful in practice, and prove it to have optimal regret for certain cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2021

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

We consider a sequential assortment selection problem where the user cho...
research
03/15/2023

Borda Regret Minimization for Generalized Linear Dueling Bandits

Dueling bandits are widely used to model preferential feedback that is p...
research
03/11/2020

Delay-Adaptive Learning in Generalized Linear Contextual Bandits

In this paper, we consider online learning in generalized linear context...
research
05/21/2022

Pessimism for Offline Linear Contextual Bandits using ℓ_p Confidence Sets

We present a family {π̂}_p≥ 1 of pessimistic learning rules for offline ...
research
06/08/2020

Learning the Truth From Only One Side of the Story

Learning under one-sided feedback (i.e., where examples arrive in an onl...
research
09/15/2022

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

We propose a novel contextual bandit algorithm for generalized linear re...
research
03/23/2020

Algorithms for Non-Stationary Generalized Linear Bandits

The statistical framework of Generalized Linear Models (GLM) can be appl...

Please sign up or login with your details

Forgot password? Click here to reset