A General Framework to Analyze Stochastic Linear Bandit

02/12/2020
by   Nima Hamidi, et al.
0

In this paper we study the well-known stochastic linear bandit problem where a decision-maker sequentially chooses among a set of given actions in R^d, observes their noisy reward, and aims to maximize her cumulative expected reward over a horizon of length T. We introduce a general family of algorithms for the problem and prove that they are rate optimal. We also show that several well-known algorithms for the problem such as optimism in the face of uncertainty linear bandit (OFUL) and Thompson sampling (TS) are special cases of our family of algorithms. Therefore, we obtain a unified proof of rate optimality for both of these algorithms. Our results include both adversarial action sets (when actions are potentially selected by an adversary) and stochastic action sets (when actions are independently drawn from an unknown distribution). In terms of regret, our results apply to both Bayesian and worst-case regret settings. Our new unified analysis technique also yields a number of new results and solves two open problems known in the literature. Most notably, (1) we show that TS can incur a linear worst-case regret, unless it uses inflated (by a factor of √(d)) posterior variances at each step. This shows that the best known worst-case regret bound for TS, that is given by (Agrawal Goyal, 2013; Abeille et al., 2017) and is worse (by a factor of √(()d)) than the best known Bayesian regret bound given by Russo and Van Roy (2014) for TS, is tight. This settles an open problem stated in Russo et al., 2018. (2) Our proof also shows that TS can incur a linear Bayesian regret if it does not use the correct prior or noise distribution. (3) Under a generalized gap assumption and a margin condition, as in Goldenshluger Zeevi, 2013, we obtain a poly-logarithmic (in T) regret bound for OFUL and TS in the stochastic setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

On Worst-case Regret of Linear Thompson Sampling

In this paper, we consider the worst-case regret of Linear Thompson Samp...
research
11/08/2022

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

In this paper, we address the stochastic contextual linear bandit proble...
research
07/19/2022

Regret Minimization with Noisy Observations

In a typical optimization problem, the task is to pick one of a number o...
research
11/20/2016

Linear Thompson Sampling Revisited

We derive an alternative proof for the regret of Thompson sampling () in...
research
02/27/2023

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Consider a decision-maker that can pick one out of K actions to control ...
research
05/12/2019

On the Performance of Thompson Sampling on Logistic Bandits

We study the logistic bandit, in which rewards are binary with success p...
research
02/02/2019

First-Order Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior...

Please sign up or login with your details

Forgot password? Click here to reset