Nonstationary Bandit Learning via Predictive Sampling

05/04/2022
by   Yueyang Liu, et al.
0

We propose predictive sampling as an approach to selecting actions that balance between exploration and exploitation in nonstationary bandit environments. When specialized to stationary environments, predictive sampling is equivalent to Thompson sampling. However, predictive sampling is effective across a range of nonstationary environments in which Thompson sampling suffers. We establish a general information-theoretic bound on the Bayesian regret of predictive sampling. We then specialize this bound to study a modulated Bernoulli bandit environment. Our analysis highlights a key advantage of predictive sampling over Thompson sampling: predictive sampling deprioritizes investments in exploration where acquired information will quickly become less relevant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2018

An Information-Theoretic Analysis for Thompson Sampling with Many Actions

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...
research
03/02/2022

An Analysis of Ensemble Sampling

Ensemble sampling serves as a practical approximation to Thompson sampli...
research
12/03/2018

Thompson Sampling for Noncompliant Bandits

Thompson sampling, a Bayesian method for balancing exploration and explo...
research
02/18/2021

A Bit Better? Quantifying Information for Bandit Learning

The information ratio offers an approach to assessing the efficacy with ...
research
02/18/2023

Approximate Thompson Sampling via Epistemic Neural Networks

Thompson sampling (TS) is a popular heuristic for action selection, but ...
research
01/06/2022

Gaussian Imagination in Bandit Learning

Assuming distributions are Gaussian often facilitates computations that ...
research
11/03/2021

On Johnson's "sufficientness" postulates for features-sampling models

In the 1920's, the English philosopher W.E. Johnson introduced a charact...

Please sign up or login with your details

Forgot password? Click here to reset