The Randomized Elliptical Potential Lemma with an Application to Linear Thompson Sampling

02/16/2021
by   Nima Hamidi, et al.
0

In this note, we introduce a randomized version of the well-known elliptical potential lemma that is widely used in the analysis of algorithms in sequential learning and decision-making problems such as stochastic linear bandits. Our randomized elliptical potential lemma relaxes the Gaussian assumption on the observation noise and on the prior distribution of the problem parameters. We then use this generalization to prove an improved Bayesian regret bound for Thompson sampling for the linear stochastic bandits with changing action sets where prior and noise distributions are general. This bound is minimax optimal up to constants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2019

Randomized Exploration in Generalized Linear Bandits

We study two randomized algorithms for generalized linear bandits, GLM-T...
research
01/29/2018

Information Directed Sampling and Bandits with Heteroscedastic Noise

In the stochastic bandit problem, the goal is to maximize an unknown fun...
research
11/05/2021

An Empirical Study of Neural Kernel Bandits

Neural bandits have enabled practitioners to operate efficiently on prob...
research
07/12/2021

Metalearning Linear Bandits by Prior Update

Fully Bayesian approaches to sequential decision-making assume that prob...
research
06/15/2023

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Thompson sampling (TS) is widely used in sequential decision making due ...
research
07/03/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Thompson sampling and other Bayesian sequential decision-making algorith...
research
10/20/2020

The Elliptical Potential Lemma Revisited

This note proposes a new proof and new perspectives on the so-called Ell...

Please sign up or login with your details

Forgot password? Click here to reset