Thompson Sampling on Symmetric α-Stable Bandits

07/08/2019
by   Abhimanyu Dubey, et al.
5

Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from symmetric α-stable distributions, which are a class of heavy-tailed probability distributions utilized in finance and economics, in problems such as modeling stock prices and human behavior. We present an efficient framework for posterior inference, which leads to two algorithms for Thompson Sampling in this setting. We prove finite-time regret bounds for both algorithms, and demonstrate through a series of experiments the stronger performance of Thompson Sampling in this setting. With our results, we provide an exposition of symmetric α-stable distributions in sequential decision-making, and enable sequential Bayesian inference in applications from diverse fields in finance and complex systems that operate on heavy-tailed features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2013

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits

We present a formal model of human decision-making in explore-exploit ta...
research
02/07/2021

Regret Minimization in Heavy-Tailed Bandits

We revisit the classic regret-minimization problem in the stochastic mul...
research
06/07/2022

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

We design new policies that ensure both worst-case optimality for expect...
research
09/10/2017

Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling

Reinforcement learning studies how to balance exploration and exploitati...
research
07/12/2013

Thompson Sampling for 1-Dimensional Exponential Family Bandits

Thompson Sampling has been demonstrated in many complex bandit models, h...
research
05/25/2018

Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming

We design a new myopic strategy for a wide class of sequential design of...
research
03/01/2019

Metropolized Knockoff Sampling

Model-X knockoffs is a wrapper that transforms essentially any feature i...

Please sign up or login with your details

Forgot password? Click here to reset