MOTS: Minimax Optimal Thompson Sampling

03/03/2020
by   Tianyuan Jin, et al.
11

Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can achieve the minimax optimal regret O(√(KT)) for K-armed bandit problems, where T is the total time horizon. In this paper, we solve this long open problem by proposing a new Thompson sampling algorithm called MOTS that adaptively truncates the sampling result of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound O(√(KT)) for finite time horizon T and also the asymptotic optimal regret bound when T grows to infinity as well. This is the first time that the minimax optimality of multi-armed bandit problems has been attained by Thompson sampling type of algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

We study the regret of Thompson sampling (TS) algorithms for exponential...
research
02/21/2020

Double Explore-then-Commit: Asymptotic Optimality and Beyond

We study the two-armed bandit problem with subGaussian rewards. The expl...
research
11/05/2021

Maillard Sampling: Boltzmann Exploration Done Optimally

The PhD thesis of Maillard (2013) presents a randomized algorithm for th...
research
02/09/2018

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Regret bounds in online learning compare the player's performance to L^*...
research
06/05/2023

Online Learning with Feedback Graphs: The True Shape of Regret

Sequential learning with feedback graphs is a natural extension of the m...
research
09/12/2014

On Minimax Optimal Offline Policy Evaluation

This paper studies the off-policy evaluation problem, where one aims to ...
research
10/01/2021

Batched Thompson Sampling

We introduce a novel anytime Batched Thompson sampling policy for multi-...

Please sign up or login with your details

Forgot password? Click here to reset