Doubly Robust Thompson Sampling for linear payoffs

02/01/2021
by   Wonyoung Kim, et al.
0

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. Since the arm choice depends on the past context and reward pairs, the contexts of chosen arms suffer from correlation and render the analysis difficult. We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling (TS) that applies the DR technique used in missing data literature to TS. The proposed algorithm improves the bound of TS by a factor of √(d), where d is the dimension of the context. A benefit of the proposed method is that it uses all the context data, chosen or not chosen, thus allowing to circumvent the technical definition of unsaturated arms used in theoretical analysis of TS. Empirical studies show the advantage of the proposed algorithm over TS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2019

Doubly-Robust Lasso Bandit

Contextual multi-armed bandit algorithms are widely used in sequential d...
research
06/19/2021

Variance-Dependent Best Arm Identification

We study the problem of identifying the best arm in a stochastic multi-a...
research
04/15/2017

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

This paper considers the multi-armed thresholding bandit problem -- iden...
research
03/11/2018

Multi-objective Contextual Bandit Problem with Similarity Information

In this paper we propose the multi-objective contextual bandit problem w...
research
06/06/2022

Robust Pareto Set Identification with Contaminated Bandit Feedback

We consider the Pareto set identification (PSI) problem in multi-objecti...
research
07/22/2022

High dimensional stochastic linear contextual bandit with missing covariates

Recent works in bandit problems adopted lasso convergence theory in the ...

Please sign up or login with your details

Forgot password? Click here to reset