DeepAI
Log In Sign Up

Thompson Sampling with a Mixture Prior

06/10/2021
by   Joey Hong, et al.
0

We study Thompson sampling (TS) in online decision-making problems where the uncertain environment is sampled from a mixture distribution. This is relevant to multi-task settings, where a learning agent is faced with different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior – dubbed MixTS – and develop a novel, general technique for analyzing the regret of TS with such priors. We apply this technique to derive Bayes regret bounds for MixTS in both linear bandits and tabular Markov decision processes (MDPs). Our regret bounds reflect the structure of the problem and depend on the number of components and confidence width of each component of the prior. Finally, we demonstrate the empirical effectiveness of MixTS in both synthetic and real-world experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/05/2021

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

In online learning problems, exploiting low variance plays an important ...
06/09/2022

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...
05/29/2021

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimens...
05/21/2018

Online Learning in Kernelized Markov Decision Processes

We consider online learning for minimizing regret in unknown, episodic M...
12/12/2022

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-worl...
01/15/2019

The Bayesian Prophet: A Low-Regret Framework for Online Decision Making

Motivated by the success of using black-box predictive algorithms as sub...