Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

09/15/2022
by   Wonyoung Kim, et al.
0

We propose a novel contextual bandit algorithm for generalized linear rewards with an Õ(√(κ^-1ϕ T)) regret over T rounds where ϕ is the minimum eigenvalue of the covariance of contexts and κ is a lower bound of the variance of rewards. In several practical cases where ϕ=O(d), our result is the first regret bound for generalized linear model (GLM) bandits with the order √(d) without relying on the approach of Auer [2002]. We achieve this bound using a novel estimator called double doubly-robust (DDR) estimator, a subclass of doubly-robust (DR) estimator but with a tighter error bound. The approach of Auer [2002] achieves independence by discarding the observed rewards, whereas our algorithm achieves independence considering all contexts using our DDR estimator. We also provide an O(κ^-1ϕlog (NT) log T) regret bound for N arms under a probabilistic margin condition. Regret bounds under the margin condition are given by Bastani and Bayati [2020] and Bastani et al. [2021] under the setting that contexts are common to all arms but coefficients are arm-specific. When contexts are different for all arms but coefficients are common, ours is the first regret bound under the margin condition for linear models or GLMs. We conduct empirical studies using synthetic data and real examples, demonstrating the effectiveness of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2022

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

We propose a novel algorithm for linear contextual bandits with O(√(dT l...
research
02/28/2017

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Contextual bandits are widely used in Internet services from news recomm...
research
03/30/2023

Contextual Combinatorial Bandits with Probabilistically Triggered Arms

We study contextual combinatorial bandits with probabilistically trigger...
research
03/15/2023

Borda Regret Minimization for Generalized Linear Dueling Bandits

Dueling bandits are widely used to model preferential feedback that is p...
research
06/01/2016

Contextual Bandits with Latent Confounders: An NMF Approach

Motivated by online recommendation and advertising systems, we consider ...
research
08/21/2023

Clustered Linear Contextual Bandits with Knapsacks

In this work, we study clustered contextual bandits where rewards and re...
research
07/15/2019

A Dimension-free Algorithm for Contextual Continuum-armed Bandits

In contextual continuum-armed bandits, the contexts x and the arms y are...

Please sign up or login with your details

Forgot password? Click here to reset